EDITORIAL STAFF 


EpItTorR 
WILLIAM KRUSKAL 


AssociaTE EpiTors 
ALLAN BIRNBAUM DONALD A. DARLING OSCAR KEMPTHORNE 
Z. W. BIRNBAUM WASSILY HOEFFDING E. L. LEHMANN 
N. L. JOHNSON 


WITH THE COOPERATION OF 


.R. Buum Cyrus DERMAN Sotomon KuULLBAcK 


. C. Bose J. L. Doos Evua@ene Luxacs 
. L. BuRKHOLDER Meyer Dwass G. E. Noreruer 

. G. CHAPMAN D. A. 8. Fraser Howarp Ralirra 
. 8. Connor SAMUEL KARLIN H. E. Ropsins 

. R. Cox Harry KEsten Watter L. SuitH 
F. DaLy C. H. Krart Lione. WEtss 


Past EpitTors oF THE ANNALS 


H. C. Carver, 1930-1938 T. W. AnpERson, 1950-1952 
8.8 


. WrxKs, 1938-1949 E. L. LeuMann, 1953-1955 
T. E. Harris, 1955-1958 


Published quarterly by the Institute of Mathematical Statistics in March, 
June, September and December. 


IMS INSTITUTIONAL MEMBERS 


AMERICAN Viscose Corporation, Marcus Hook, Pennsylvania. 

BELL TELEPHONE LABORATORIES, INc., TECHNICAL LiBRARY, 463 West Street, New York 14, 
New York. 

ComMITTEE ON Sratistics, INDIANA University, Bloomington, Indiana 

INTERNATIONAL Business Macuines Corporation, New York 

Iowa State CoLiece, Statistica, LaBporatory, Ames, Iowa 

LocKHEED AIRCRAFT CoRPORATION, Burbank, California 

Massacuusetts INstiTUTE OF TECHNOLOGY, HAYDEN LiBRaARY, PERIODICAL DEPARTMENT, 
Cambridge 39, Massachusetts 

Micuigan State UNiversity, DEPARTMENT oF Statistics, East Lansing, Michigan 

Nationa Security Acency, Fort George G. Meade, Maryland 

PriNcETON UNIVERSITY, DEPARTMENT OF MATHEMATICS, SECTION OF MATHEMATICAL 
Statistics, Princeton, New Jersey 

PurpveE University, Lafayette, Indiana 

Strate University or Iowa, Iowa City, Iowa 

Tse Catuouic UNIVERSITY OF AMERICA, STATISTICAL LABORATORY, DEPARTMENT OF MaTH- 
EMATICS, Washington, D. C. 

Tae Ramo-Woo.pripGe Corporation, Los Angeles, California 

UNIVERSITY OF CALIFORNIA, STATISTICAL LaBoraToRY, Berkeley, California 

University or Ituinois, Urbana, Illinois 

University or Norta Caro.ina, DEPARTMENT OF Statistics, Chapel Hill, North Carolina 

University oF WASHINGTON, LABORATORY OF STATISTICAL ResEarcn, Seattle, Washington 








OPTIMUM DESIGNS IN REGRESSION PROBLEMS 
By J. Krerer! anp J. WoLrowirz? 
Cornell University 


1. Introduction and Summary. Although regression problems have been 
considered by workers in all sciences for many years, until recently relatively 
little attention has been paid to the optimum design of experiments in such 
problems. At what values of the independent variable should one take observa- 
tions, and in what proportions? The purpose of this paper is to develop useful 
computational procedures for finding optimum designs in regression problems of 
estimation, testing hypotheses, etc. In Section 2 we shall develop the theory for 
the case where the desired inference concerns just one of the regression coeffi- 
cients, and illustrative examples will be given in Section 3. In Section 4 the theory 
for the case of inference on several coefficients is developed; here there is a choice 
of several possible optimality criteria, as discussed in [1]. In Section 5 we treat 
the problem of global estimation of the regression function, rather than of the 
individual coefficients. 

We shall now indicate briefly some of the computational aspects of the search 
for optimum designs by considering the problem of Section 2 wherein the in- 
ference concerns one of k regression coefficients. For the sake of concreteness, we 
shall occasionally refer here to the example of polynomial regression on the real 
interval [—1, 1], where all observations are independent and have the same 
variance. The quadratic case is rather trivial to treat by our methods, so we 
shall sometimes refer here to the case of cubic regression. In the latter case we 
suppose all four regression coefficients to be unknown, and we want to estimate 
or test a hypothesis about the coefficient a; of x’. If a fixed number N of observa- 
tions is to be taken, we can think of representing the proportion of observations 
taken at any point xz by &(), where é is a probability measure on [—1, 1]. Toa 
first approximation (which is discussed in Section 2), we can ignore the fact 
that in what follows Né can take only integer values. We consider three methods 
of attacking the problem of finding an optimum £é: 

A. The direct approach is to compute the variance of the best linear estimator 
of a; as a function of the values of the independent variable at which observa- 
tions are taken or, equivalently, as a function of the moments of & Denoting 
by yu; the 7th moment of £, and assuming £ to be concentrated entirely on more 
than three points (so that a; is estimable), we find easily that the reciprocal of 


Received April 21, 1958; revised November 25, 1958. 

1 Research under contract with the Office of Naval Research. 

2 The research of this author was supported by the U. 8. Air Force under Contract No. 
AF 18(600)-685, monitored by the Office of Scientific Research. 


271 








272 J. KIEFER AND J. WOLFOWITZ 


this variance is proportional to 
as ( mi — we) + Qys( m2 ws + os oe — or Ms — Ha Me Ma) 
3 2; 2 . 4 2 4 
a wat Bale + 2m Ms) — Bis Me Ms + ps 
2 2 3 ‘ - 
Ma(u2 — wi) — Ms — Me + 2p1 pe Ms 





+ ps 


in the case of cubic regression. 

The problem is to find a on [—1, 1] which maximizes this expression. Thus, 
this direct approach leads to a calculation which appears quite formidable. This 
is true even if one uses the remark on symmetry of the next paragraph and 
restricts attention to symmetrical £, so that 4; = 0 for i odd. For polynomials of 
higher degree or for regression functions which are not polynomials, the diffi- 
culties are greater. 

B. The results of Section 2 yield the following approach to the problem: 
Let co + ct + cox” be a best Chebyshev approximation to 2° on [—1, 1, ice., 
such that the maximum over [—1, 1] of |x° — (co + ce + cx’)! is a minimum 
over all choices of the ¢; , and suppose B is the subset of [— 1, 1] where the maxi- 
mum of this absolute value is taken on. Then ~ must give measure one to B, 
and the weights assigned by é to the various points of B (there are four in this 
case) can be found either by solving the linear equations (2.10) or by computing 
these weights so as to make ~ a maximin strategy for the game discussed in 
Section 2. Two points should be mentioned: 

(1) In the general polynomial case, where there are k parameters (k = 4 
here), the results described in [10], p. 42, or in Section 2 below imply that there 
is an optimum £ concentrated on at most k points. Thus, even if we use this re- 
sult with the approach of the previous paragraph, we obtain the following com- 
parison in a k-parameter problem in Section 2: 


Method A: minimize a nonlinear function of 2k — 1 real variables. 
Method B: solve the Chebyshev problem and then solve k — 1 simultaneous 


linear equations. 

The fact that the solution of the Chebyshev problem can often be found in the 
literature (e.g., [2]) makes the comparison of the second method with the first 
all the more favorable. 

(2) Although the computational difficulty cannot in general be reduced further, 
in the case of polynomial regression on [—1, 1] there is present a kind of sym- 
metry (discussed in Section 2) which implies that there is an optimum & which is 
symmetrical about 0 and which is concentrated on four points; thus, in the case 
of cubic regression, this fact reduces the computation under Method A to a 
minimization in 3 variables, but Method B involves only the solution of a single 
linear equation. 

C. A third method, which rests on the game-theoretic results of Section 2, 
and which is especially useful when one has a reasonable guess of what an opti- 
mum ¢ is, involves the following steps: first guess a £, say &*, and compute the 
minimum on the left side of (2.8); second, if this minimum is achieved for 


ce = c*, compute the square of the maximum on the right side of (2.9); then, if 





OPTIMUM REGRESSION DESIGNS 273 


these two computations yield the same number, &* is optimum. If one has a 
guess of a class of é’s depending on one or several parameters, among which it is 
thought that there is an optimum &, then one can maximize over that class at 
the end of the first step and, the maximum being at &*, go through the same 
analysis as above. This method is illustrated in Example 3.5 and Example 4. 
Of course, the remarks (1) and (2) of the previous paragraph can be used in 
applying Method C, as in these examples. 

In the example of cubic regression just cited, the optimum procedure turns 
out to be &(—1) = &(1) = 3, &(4) = &(—4) = }. It is striking that any of 
the commonly used procedures which take equal numbers of observations at 
equally spaced points on [—1, 1] requires over 38% more observations than this 
optimum procedure in order to yield the same variance for the best linear esti- 
mator of a; (see Example 3.1); the comparison is even more striking for higher 
degree regression. The unique optimum procedure in the case of degree h is 
given by (3.3). 

The comparison of a direct computational attack, analogous to that of A 
above, with the methods developed in Sections 4 and 5 for the problems con- 
sidered there, indicates even more the inferiority of the direct attack. In par- 
ticular cases, e.g., Example 5.1, special methods may prove useful. 

Among recent work in the design of experiments we may mention the papers 
of Elfving [3], [4], Chernoff [5], Williams [11], Ehrenfeld [12], Guest [13], and 
Hoel [15]. Only Guest and Hoel explicitly consider computational problems of 
the kind discussed below. Our methods of employing Chebyshev and game 
theoretic results seem to be completely new. The results obtained in the ex- 


amples below are also new, except for some slight overlap with results of [13] 
and [15], which is explicitly described below. 

We shall consider elsewhere some further problems of the type considered in 
this paper. 


2. The optimum design relative to 1 out of k regression coefficients. Let 
fi, +++ Je be k real-valued functions on a given space 9%. Throughout this sec- 
tion we assume a topology is given on X in which 


(2.1) X is compact; fi, --- , f, are continuous. 
We also assume 
(2.2) fi, +++ Je are linearly independent on &. 


Since we will be considering a regression problem in which the f; are known 
functions and >-; a;f; is the regression function, (2.2) is really only an assump- 
tion of identifiability of the a; which will avoid trivial circumlocutions. Without 
some assumption like the first part of (2.1), there may trivially exist procedures 
which estimate some of the regression coefficients with arbitrarily small variance, 
as can be seen in the example of estimation of the slope of a straight line on © = 
real line. The assumption of continuity of the f; can be somewhat weakened, as 
will be clear from our proofs. 





74 J. KIEFER AND J. WOLFOWITZ 


We consider the following regression setup: For any point x (value of the in- 
dependent variable) in 9, one can observe a random variable Y, for which 


k 


(2.3) EY, = Doafiz), 


i 


2 


Var(Y,) =a 


> 


where a = (a,°--, a&) is the vector of regression coefficients, an unknown 
element of @. The value of o° will usually be unknown. (The case where o° can 
depend on x in a way which is known except for a proportionality constant will 
be discussed in the last paragraph of this section.) An integer n is given (usually 
n > k), and the experimenter must select a collection X = (2,---,2,) of n 
points in X at which the independent random variables Y,, ,--- , Yz, are to be 
observed. The z; need not be distinct, but if i # j and x; = 2x; we shall still, 
without confusion, write Y,,; and Y,, for two independent random variables. 

Any X can be viewed as a measure 7 on X which assigns to each point x a 
mass equal to the number of x; in X which are equal to z. Dividing this measure 
by n, we obtain a discrete probability measure — on X which assigns to each 
point of & a measure equal to an integral multiple of 1/n. In the present section 
(a similar discussion applying in Sections 4 and 5), we shall be concerned with 
choosing a — (hence, an X ) to maximize a quantity of the form 


(2.4) min L HA(x)n(dx) = min fe nH .(x)é(dx), 


where the form of H.. is determined by the problem at hand. The fact that & can 
only take on multiples of 1/n as its values makes this problem of maximization 
quite unwieldy in general. We shall treat, instead, a problem whose solution will 
sometimes give a solution to the original problem and which will usually give a 
good approximation to the latter: Find a probability measure &* on X for which the 
right side of (2.4) is a maximum: i.e., we maximize (2.4) with no restriction on &. 
Of course, the maximum does not depend on n. Thus, if 7 is such that né* takes 
on only integral values, this yields an exact solution 7 to the original problem. 
We shall see in Sections 3 and 5 that, in two typical examples, &* takes on only 
values which are multiples of 1/(2k — 2) (Example 3.1) or 1/k (Example 5.1), 
so that this situation is not vacuous. Moreover, there will typically be a &* which 
is concentrated on approximately k points; thus, when né* does not take on only 
integral values, obvious integral approximations 7’ to né* will yield values of 
(2.4) whose ratio to the maximum tends to 1 as n — = (it is easy to give a 
bound on the difference of this ratio from unity). Thus, the characterization of a 
single —* which yields an almost optimum design for all large n, in distinction to 
finding the best which may depend in a complicated fashion on n, seems to be 
of practical value. 

We therefore define = to be the space of all discrete probability measures 
— on &. We could, more generally, specify a Borel field @ on 9% and let = be the 





OPTIMUM REGRESSION DESIGNS 275 
class of all measures (®) on X; however, in all of our applications (see Theorem 
2) it will suffice to let @ consist of countable sets and their complements. 

In the present section we are concerned with statistical inference about the 
single parameter a, , where all a; are assumed unknown. We shall give a precise 
definition of optimality in the next paragraph. What this definition means is 
that we restrict ourselves to designs for which a, is estimable (i.e., for which 
there exist linear unbiased estimators of a, ; in practice, of course, n will have to 
be suitably large for there to exist such designs), and seek a design for which the 
linear unbiased estimator of a, with minimum variance (best linear estimator, 
or b.l.e.) has a variance which is a minimum over all designs, within the ap- 
proximation noted two paragraphs above. It is well known that such a design is 
optimum for problems of point estimation of a if the Y, are assumed to be 
normal, in the sense that (for example) it yields a minimax procedure for any of 
a wide variety of weight functions; when the distributions of Y, are assumed to 
belong to any larger class, the same result holds for the squared error loss func- 
tion. For problems of interval estimation and hypothesis testing or m decisions, 
similar optimality results hold under normality if o° is known. If o° is unknown, 
such results hold provided every design for which a, is estimable yields as many 
degrees of freedom to error as does the design we obtain; see Example 3.4 in 
Section 3 for further discussion. 

We now define precisely the term “optimum” as used in this section. There 
are a few preliminaries. In the original description of a design, let X be a design 
for which a, is estimable. Let h; , --- , Ax; be numbers such that the function 
f=h- vee hf; is orthogonal to f; for i < k in the sense that 


> filar)fe (ar) == @, i<k. 
r=1 
Let a* = (a; , +++, at) be such that Dias, = > ath + arfr ; thus, a: 


= a,. For the least squares setup in terms of a*, the orthogonality of fr to the 
J; for « < k makes the last of the normal equations 


(2.6) > Ut (a,)Par = A (2-)¥2, , 
r=l r=1 


sothato’ times the reciprocal of the variance of the b.Le. of a; = Kis x (fr ( a,)]’. 
Since fr is orthogonal to f,, --- , fr-1, this last sum is just the square of the 
distance of the n-vector (fi(x:), --- , fe(a,)) from the linear space spanned by 
the vectors (f;(2), --- ,fi(an)) fori < k, namely, 

k-1 


(27) min > f(a) - > ef (2,)I, 


7=1 


where we have written c for (c;, «++ , Ce-1). Since (2.7) is o times the inverse 
of the variance of the b.l.e. of a, , a design X will minimize that variance if it 
maximizes (2.7). Thus, finally, in terms of the probability measures — we have 
introduced above, we make the following 

DEFINITION. A measure &* in = is said to be an optimum design (for the pa- 





276 J. KIEFER AND J. WOLFOWITZ 


rameter a,) if 


min [ts (x) — Df 'e; f(x) E*(dz) 
(2.8) 
= max min ize _ yore; f(a) PE (dx). 


fez 
For any & in =, the ratio of the left side of (2.8) (with & for &*) to the right will be 
called the efficiency e(£) of &. 

Of course, the practical meaning of efficiency is that, if one design has r times 
the efficiency of the second design, then the latter requires r times as many ob- 
servations as the former in order to obtain the same value for the left side of (2.4). 
We note that it is a consequence of this definition that an optimum design is 
optimum for all values of o°. 

The form of (2.8) is very suggestive of a game, and we shall exploit that fact 
presently. However, the main aspect of our technique for computing an optimum 
é* has nothing to do with the game formulation, so we treat that aspect first. 
Our technique is to throw the main computational difficulties into a Chebyshev 
approximation problem, which can often be solved by standard methods and 
which, for many important {fj}, even has a solution which can be found in the 
literature. We shall call c* = (cf? ,---, cr-1) a Chebyshev coefficient vector if 
> ' c*f; is a best approximation to f, on 9 in the sense of Chebyshev, i.e 
in the uniform norm: 


? 


k—1 k—-1 
(2.9) min max |f,(x2) — a cif ;(x)| = max |f,(x) — a crf ;(x) ; 
eX I reX I 
Let m(c*) denote the right side of (2.9), and let B(c*) be the set of points z for 
which ! fi(a2) — > ‘ e*f(a)| = m(c*). Our first result gives a simple geometric 
sufficient condition for a £ to be optimum; this is valid even without the conditions 
that yield the game-theoretic results of Theorem 2. 
THEOREM 1. If c* is Chebyshev and &(B(c*)) = 1 and 


(2.10) itke — ret fi(x)] f(x) (dx) = 0 


fori < k, then & is optimum. 
: k~-1 * . i ‘ : 
Proor: According to (2.10),>-1"'c} f; is the projection relative to & of f, on 
the linear space spanned by fi, --- , fx-1. Hence, for any element ¢’ of =, 


min [ito — Sore; f(x) (da) 
1 | ie) — dere} f(x) PE (dz) 


(2.11) 
= [m(c*)/ = [tue — ret f(x) (dz) 


sO 


which proves the desired result. 


min | ta(z) — re; f(x) Pe (dx), 





OPTIMUM REGRESSION DESIGNS 277 


The question arises as to whether there always exists a § which satisfies the 
hypotheses of Theorem 1 and whether, in fact, the conditions of the theorem are 
also necessary for a £ to be optimum. There also arises the question of whether 
we can find a useful bound such that there is an optimum £ which assigns positive 
probability to at most the number of points given by this bound. These questions 
can be answered directly algebraically, but since the results we require already 
appear in the literature in connection with the analysis of certain games, we 
shall therefore consider the following zero-sum two-person game associated with 
the design problem: player 1 (resp., 2) has X (resp., C = Euclidean (k — 1)-space ) 
as his space of pure strategies; the payoff function is K(z, c) = [fi(x) — 
ene) the space of mixed strategies of player 1 is =, while that of player 2 
is immaterial, since the convexity of K in C implies, according to Jensen’s in- 
equality, that for any randomized strategy of player 2 there is a nonrandomized 
strategy which is at least as good for all x. Of course the important thing is that 
an optimum (maximin) strategy for player 1 represents an optimum design. 
We now state the simple modifications of certain results of [6] which we require. 

Lemma. The game of = vs. C is determined, player 2 has a nonrandomized mini- 
max strategy c*, and player 1 has a maximin strategy &* which is concentrated on al 
most k — p points, where p is the dimensionality of the convex set of nonrandomized 
minimax strategies of player 2. 

Proor: Let Cy be the set of all c for which c’c = > Ci < N’, and let Cy be 
the complement of Cy . Since the f; are linearly independent, there is a finite 
subset H of & such that, for everye with c’c = 1, dt cifi(x) is nonzero for at 
least one x in H. Hence, if £’ assigns positive probability to each z in H, we 
clearly have otc: Dove f(x) é'(2)| > e« > 0 for all ¢ such that c’c = 1, and 
thus this absolute value is >Ne for c’c = N’. Since f; is bounded, we conclude 
that inf..g, K(t’, c) + » as N — «. Hence, there is an N’ such that for any 
cin Cy. there is ac’ in Cy with sup:K(£, ce’) < sup;:K(é, c). Thus c* is minimax 
if and only if c* ¢ Cy, and c* is minimax when the space of player 2 is restricted 
to Cy, . Since Cy is compact and K is continuous, the game of = vs. Cy is de- 
termined, and there exists, for all N > N’, a minimax strategy c* which we can 
take to be a fixed member of Cy, . Let p be the dimension of the (convex) set of 
such minimax strategies in Cy. There also exists a maximin strategy ty for 
the game of = vs. Cy , and by [6] we can for N > N’ take &% to be concentrated 
on at most k — p points. Let £; = [(j — 1)&} + &’]/j. Clearly, for each j there 
is an N; such that K(é; , c*¥) < K(&;,c) for all cin Cy; . Thus, since £} is maxi- 
mal with respect to c*, we have, for N; > N’, 


sup inf K(&,c) = inf K(&;,c) = inf K(&;, c) 


teZ cel cec ceCn; 


(1 — -) inf K(t; ,c) = (: - ') K(& ,c*) 
7 ceCNn j J 
(1 _ ') sup K(,c*) = (1 - *) in sup K(é,c). 
J g Jf cec §& 


Letting j — ~, we see that the game of Z vs. C is determined, that c* is minimax, 





278 J. KIEFER AND J. WOLFOWITZ 


and that if {e*} is a subsequence of the {¢*! which converges to a limit ¢* which 
is concentrated on no more than k — p points (such a subsequence and limit 
exist, by the compactness of %) and c” minimizes K(é*, c), we have 
sup inf K(é,c) = lim inf K(&;,, ¢) S lim K(&;, , c”) 
£ cec to efcC 72 
(2.13) i ; = 
= K(&*, c”) = inf K(£*, c), 


cec 
so that £* is maximin. Thus, the lemma is proved. 

We mention in passing several other related points: The bound k — p is 
indicated in [6] not to be the best possible and is reduced under conditions (c* 
in the boundary of a compact C) for which it is difficult to find general counter- 
parts here. Also, it is evident that c* is unique (p = 0) if K(z, ¢) is strictly 
convex in c, but strict convexity is clearly not a useful condition in our problem. 
If X is not compact or the f; are not continuous, suitable assumptions will still 
imply determinateness, but the other results will have to be stated in terms of 
e-optimum strategies. 

The above lemma indicates one method for trying to compute a &*: For 
simplicity, assume p = 0 or that we have no knowledge of p. The é’s on & which 


are concentrated on at most k points form a (2k — 1)-parameter family. One 
can thus, in principle, maximize min.K(£, c) with respect to these 2k — 1 pa- 


rameters and obtain an optimum &*. As we have indicated in the introduction, 
this is usually an unrewarding task, and the method indicated in Theorem 1 
seems far superior in practical examples. The consequences of the lemma for the 
method of Theorem 1 may be summarized as follows: 

THEOREM 2. Jf — is maximal with respect to c* while c* is minimal with respect 
to &, then & is optimum and c* is Chebyshev. Every optimum é satisfies the conditions 
of Theorem 1 for every Chebyshev c*. There exists an optimum & concentrated on at 
most k — p points, where p is the dimensionality of the Chebyshev vectors. 

Proor: The Chebyshev vectors clearly coincide with the minimax strategies. 
If — is maximin and c* is minimax, then determinateness implies that c* is mini- 
mal with respect to £, i.e., min.K(£, c) = K(é, c*). Thus, }oT ‘c*f; is the pro- 
jection, relative to &, of f, on the linear space spanned by f,, --- , fx. , so that 
(2.10) clearly holds. Since, by (2.11), max min K(£, c) = [m(c*)}’ is the value 
of the game, §(B.,) = 1. The last assertion of the theorem is taken directly from 
the lemma, while the first is a general result in the theory of games. We note 
that any optimum £ must give measure one to the intersection of all B(c*) for 
c* Chebyshev. 

We have mentioned, in Theorem 1 and in the second paragraph below the 
proof of the lemma, two computational approaches. The first sentence of Theorem 
2 indicates a useful approach if one can make a good guess of &: guess a &’ and 
compute min.K(t’, c) = K(t’, c’) (say); compute max,K(z, c’); if these two 
are equal, then ¢’ is optimum. This is an approach which is standard in game 
theory and which has proved useful in many examples; it sometimes helps to 


let ¢’ depend on a few parameters, with respect to which one maximizes 





OPTIMUM REGRESSION DESIGNS 279 


min, (é’, c). A comparison of the various methods for obtaining an optimum é 
was given in an example in Section 1. 

In the next section we shall give several examples of the computation of 
optimum &’s. We shall not bother to list in detail all of the standard results in 
approximation theory which are useful in such computations. We mention here 
for future reference only the classical generalized Chebyshev theorem [2, p. 74], 
which states that if 9X is a compact real interval and if no nontrivial linear com- 
bination of f;,--- , fx. has more than k — 2 zeros (in this case, these f; are 
called a Chebyshev system), then the Chebyshev vector c* is unique and is char- 
acterized by the fact that there are at least k points at which f, — > o1'c} f; 
attains its maximum absolute deviation from zero, the maximum being taken 
on with successive alternations in sign. (The literature contains generalizations 
of this result to other spaces.) 

Before proceeding further it is relevant here to point out the following connec- 
tions with earlier results: 

1) Elfving [3] considered the special case where X contains a finite number of 
discrete points. It follows from his elegant geometrical argument that the opti- 
mum £ is concentrated on at most k points and satisfies (2.10). 

2) Consider the case of polynomial regression (2X a closed interval of the real 
line, f(x) = x’ '). Then p = 0 by the Chebyshev theorem cited above. Theorem 
2 then says, inter alia, that there exists an optimum £ concentrated on at most 
k points. This result (for this important particular case) is already well known 
in the theory of moment problems ({10], p. 42). It holds identically in o*. If it 
did not hold for all o? it would be useless in our problem when o* is unknown. 
This result holds even when (2.18) below is true, with fixed v.) 

We now give a simple result on the uniqueness of the optimum £*. 

THEOREM 3. If X is a compact real interval, f; , --- , fr is a Chebyshev system, 
and B(c*) contains exactly k points, then the optimum &* is unique. 

Proor: Let 2, --- , 2 be the ordered members of B(c*), and let Q be the 
(k — 1) X k matrix whose (7, j)th element is (—1)’f;(z;). Let & denote a k-vee- 
tor whose jth component is the number £(2z,;). According to (2.10), which, by 
Theorem 2, is necessary, and the Chebyshev theorem cited above, any optimum 
— must satisfy 


(2.14) QE = 0. 


(Of course, it must also satisfy §(B(c*)) = 1.) Now, Q has rank k — 1, since, 
if it had smaller rank, a nontrivial weighted sum of rows of Q would be 0 and 
the f; could not be a Chebyshev system. The linear equations (2.14) thus have 
a one-dimensional set of solutions £, and clearly at most one of these can be a 
probability measure. This completes the proof. 

If B(c*) consists of more than k points, an analysis like that above will give 
information on how large the class of optimum £’s can be. 

Remark on symmetry (invariance): As we have indicated in Section 1, it will 
sometimes be easy, as in the case of polynomial regression, to infer that there 





280 J. KIEFER AND J. WOLFOWITZ 


is an optimum £ with some symmetry property. Formally, suppose that there is 
a group G of transformations on & such that for each g in G there is a transforma- 
tion g’ on @ such that, writing (g’a); for the 7th coordinate of g’a, we have (g’a), = 
a, for g in G and 


(2.15) afr) = doi(g'a) figr) 


for all « and all (a, --+ , a). (One may let g’ act on the vector of functions 
f; instead of on a.) Then the problem in terms of the parameters (g’a); and the 
independent variable gx coincides with the original problem. Hence, if & is opti- 
mum for the original problem, it is also optimum for the above problem in 
terms of gx and hence the measure &£, defined by 


(2.16) f(A) = &(g A) 


is optimum for the original problem in terms of x. Suppose for the moment that 
G contains a finite number, say L, of elements. Write 


(2.17) E= > ¢,/L. 
geG@ 

It is easy to prove that, if is optimum, then so is £; in fact, this is obvious sta- 
tistically, since the variance of the average of L b.l.e.’s from the L independent 
experiments & with V/L observations each cannot be less than that of the b.lLe. 
from ~— with N observations (since — can be broken up into such experiments), 
but is clearly equal to the variance of the b.l.e. from & based on N observations. 
Thus, we have: 

There exists an optimum design which is symmetric with respect to (invariant 
under) G. 

The analogous result can be proved for G compact or satisfying conditions 
which yield the usual minimax invariance theorem in statistics; see, e.g., (7). 

The fact that there exists an optimum symmetric design and an optimum 
design concentrated on (e.g.) k points does not imply the existence of an opti- 
mum design with both of these properties. For example, if %& = [—1, 1], k = 2, 
fi(z) = 1, and f(x) = 2’, there is an optimum design concentrated on the two 
points 0 and 1, but the only symmetric design requires the three points 0, —1, 
and 1. However, in the event that g’ does not act (as it does in the example just 
cited) as the identity for every g, we may be able to obtain some simplification. 
For example, without discussing the most general possibility, let us suppose that 
Q is a set of integers containing k and such that (g’a); = a; for all g if je Q, 
while >~,(g’a); = 0 for j not in Q. Consider the problem of finding an optimum 
design ~ on the space of equivalence classes of 9% under the equivalence x ~ 2’ 
if x’ = gx for some g, where the regression function is > ie af;(z) (at the 
equivalence class of x). If there are g integers in Q, there is by Theorem 2 an 
optimum r* concentrated on at most g points. This 7* corresponds to a unique 
symmetric (with respect to G) measure £* on &, and it is easy to see that (2.10) 
is satisfied for all i < k. Thus, if there are L elements in G, this &* is concentrated 





OPTIMUM REGRESSION DESIGNS 281 


on at most gL points. For example, in the case of polynomial regression of even 
degree h (= k — 1) on [—1, 1], G contains two elements and the set Q corre- 
sponds to the gq = 1 + A/2 even powers, and we obtain that there is a symmetric 
optimum £ concentrated on at most h + 2 points. The actual case (see Ex. 3.1) 
is that there is a symmetric optimum é concentrated on k = h + 1 points; the 
previous argument did not give the best result because r* gave positive prob- 
ability to the equivalence class of 0, which corresponds to only one point of &. 
The best result could, however, be obtained using another argument: since, 
according to Theorem 3, the optimum £ is unique, our discussion of two para- 
graphs above implies that it must be symmetric, and it is thus concentrated on 
h + 1 points. Similarly, one could conclude that there is a symmetric optimum 
design concentrated on h + 1 points when h is odd, either by using Theorem 3, 
or else by invoking an obvious modification of the previous argument for the 
case when (g’a), = + a. A similar result holds in the setup of Ex. 3.5. 

Remark on heteroscedasticity and variable cost: Suppose the second line of (2.3) 
is replaced by 


(2.18) Var(Y,) = [v(x)ol’, 


where v is a known positive continuous function on &. To avoid trivialities, 
1 . > >* es 
assume v(x) bounded away from 0. Then, replacing Y, by Yz = Y./v(x) and 
fi(x) by fi (x) = f;(x)/v(x), it is clear that the entire discussion of this section 
goes through exactly as before (i.e., assuming (2.3)) since the a; for which 
tr . 117* 
EY, = > aifi( x) are the, same a; as those for which EY, = Daft (x ), and the 


latter setup satisfies the original condition (2.3) of this section. 

If there is a cost c(x) of taking an observation at the point x, and the total 
cost rather than the total number of observations is to be kept constant, it is 
sasily seen that an optimum design is obtained by going through the analysis of 
this section with v(x) above replaced by v(x)[e(x)]"”. 

Similar remarks will apply to the problems considered in Sections 4 and 5. 


3. Examples of optimum designs in the case of Section 2. 

Example 3.1. Polynomials on [a, 8). One of the most important practical exam- 
ples is that where X is the closed finite nondegenerate interval [a, 8] of reals, 
k = h +1 for someh > 0, andfj;(x) = x’ for 1 <j S h + 1; we hereafter 
write b;_; =a;,b = (bo, ---, ba), dj. = cj, and d = (do, --- , dhs). Thus, 
assuming that the regression function is a polynomial of degree Sh, we may want 
to test the hypothesis that it is actually of degree sh — 1, i.e., that b, = 0. (In 
Section 4 we consider the possibility of testing that the degree is <h — m 
where m is specified). We first note that we can write 

> jae bz’ = Gat b;{ (2x —a— £)/(B — a))’, 
where b, = ((8 — a)/2}"b, ; since (2c — a — B)/(B — a) takes on values in 
[—1, 1], an optimum strategy for arbitrary [a, 8] is immediately obtained by an 
obvious change in location and scale from an optimum strategy in the case 
{[—1, 1], and we may hereafter limit our attention to the latter. Next, we note 





282 J. KIEFER AND J. WOLFOWITZ 


that , is obviously not estimable unless & gives positive probability to at least 
h + 1 points (of course, in practice we need n > Ah + 1 if o is unknown and n = 
h + 1 if o’ is known). Hence, by Theorem 2 (or by the result of [10] cited in 
Section 2) there exists an optimum £ concentrated on exactly (h + 1) points. 
We shall actually find a unique &* which satisfies (2.8) and gives positive prob- 
ability to exactly h + 1 points.’ Thus, the phenomenon concerning degrees of 
freedom in the estimate of o° which was discussed in the sixth paragraph of 
Section 2, and which is illustrated in Example 3.4 below, cannot occur in the 
present example. 

The unique Chebyshev d* (i.e., c*) is well known in this example: x" — 


h-l 


0 d; 2x’ is simply the Ath Chebyshev polynomial (see, e.g., [2]), 
z= = dix’ = 2’ cos(h cos ‘x) 


(3.1) 9h 1» 2 1/2yh 2 1/2yhy 
= 2k +(@-1) T +e-@ -1)' 7}. 


1—h . . . . ° ° 
Moreover, m(d*) = 2°", and this extreme value is attained in magnitude (with 
. ‘ se h h—-1 )* ‘ 
successive alterations in sign) by x — > d;x; at the h + 1 points 


(3.2) 1,= —cos = , Os7 Sh. 
h ; 
Thus, B(d*) consists of these h + 1 points. Moreover, the above d* is the unique 
Chebyshev vector, since a’, az',---, 2°" form a Chebyshev system. 
According to Theorem 3, the optimum £* is unique. We now show that the 
unique optimum &* is 


g*(—1) = §*(1) = $h, 
(3.3) . jr 
é cos > ) = 1/h, lsjsh-1l. 


To prove this, we shall verify (2.14) for — = &*, since this is just (2.10), which 
by Theorems 1 and 2 is necessary and sufficient for an optimum £. Since the 
d*’s of (3.1) are zero if j + his odd, the polynomial of (3.1) is clearly orthogonal 
(with respect to &*) to x when t + his odd. When t + h is even, we can combine 
the weights —*(—1) and &* (1) and rewrite (2.14) as 


h—l ° 
(3.4) 7 (—1)' (cos ) = 0. 


;=0 


Since cos‘@ can be written as a linear combination of cos #6, cos(t — 2)6,---, 
it suffices to prove (3.4) with cos’ (xj/h) replaced by cos (rjx/h), where h + r 

’ For h = 1 and 2, the solution is given in [14]. The general solution (3.3) of the problem 
of Example 3.1 for a design optimum in the sense of Section 2, is also given in the abstract 
[11] of the apparently contemporaneous work of E. J. Williams. The methods of this author 
are probably different from ours because he does not seem to use probability measures £. 
The authors are indebted to H. L. Lucas for calling their attention to [11] which appeared 
after submission of the present manuscript. 





OPTIMUM REGRESSION DESIGNS 


is even and 0 <= r S h. But for such r we have 


h—-1 h-1 
> a (—1)’ cos (rjx/h) Rey 2, exp [jir(1 + r/nyyy 


7=0 


(3.5) 


1 — exp [ix(h + r)] 
me 4 — exp [ix(1 + ay ri 


It is interesting to compare the design £* of (3.3) with the often used design 
¢*™ (say) which assigns measure 1/M to each of the values (28 — M — 1); 
(M — 1),i = 1,2,---, M; thus ¢™ takes an equal number of observations at 
each of M equally spaced points ranging from —1 to 1. Of course, M > h. For 
such a design with M observations on the interval [0, M — 1], Fisher [8, p. 153] 
has calculated the left side of (2.4) to be (h!)‘M(M* — 1)(M* — 4) 
--» (M® — h’)/(2h)!(2h + 1)! To obtain the corresponding quantity for the 
interval [—1, 1], we must divide by [(M — 1)/2]”, and we must divide also by 
M in order to obtain the left side of (2.4) with 7 replaced by ¢™. 
Since [m(d*)}? = 2? we obtain for the efficiency (see the definition following 
(2.8)) of 


iad wu 2 (ly wy M-# 
(3.6) ee") = (2h) (2h + 1)! 1-1 (M — 1)* 


The best choice of M varies: it ish + 1 if h = 1 or 2,h + 2if h = 3, etc. For 
the often used procedure e+! we have 


- sph bt 2 *(ht)* 

(3.7) e(e"*) = Qh hh + 1)’ 
Of course, (3.7) becomes 1 for h = 1, since ¢'” = &* for h = 1; for h = 2, (3.7) 
becomes 8/9, for h = 3 it is 256/405 (the best procedure, ¢*”, has efficiency .72), 
etc.; for large h, by Stirling’s approximation, it is approximately 2°h'?2"—"e™, 
which goes to zero very rapidly. For ¢' with M — ~, the efficiency (3.6) 
approaches 2? (h!)*/(2h)!(2h + 1)!, which ash — = is approximately 7/8. 

To the experimenter who protests at the above comparison that the design 
¢*™ for some M > h is more to his liking than is the &* of (3.3) because the 
former will permit him to estimate regression coefficients a; up to ay_: (instead 
of up to a,), we can only answer that his problem is not the one of the present 
example, that he is probably using a method of inference (to ‘choose the poly- 
nomial of correct degree’) whose properties are questionable, and that a precise 
statement of his decision problem would probably lead to a procedure far superior 
to ¢"™’. In Sections 4 and 5 we shall consider some other related problems which 
may be what the experimenter is faced with, rather than the problem of the 
present example. The problem of “fitting the polynomial of best degree” is more 
unwieldy, depending strongly on the somewhat arbitrary choice of losses which 
are to be assigned to errors in estimation as compared with the penalty for using 
a polynomial of large degree. 





284 J. KIEFER AND J. WOLFOWITZ 


Example 3.2. An example where p > 0. It is easy to construct examples where 
the p of Theorem 2 is not 0 as it is in the case of a Chebyshev system. We illus- 
trate the situation with a very simple example. Suppose © = [—1, 1], k = 3, 
filx) = 1, fela) = 2’, fs(x) = x + 1. The expression z + 1 — c — cox’ has, 


within [—1, 1], derivative equal to 0 at « = $c if | c:| 2 3 and is monotone on 
x if |ce| < 4. Thus, a rovtine computation of max,|z + 1 — Gq — Cet’ | 
leads to the conclusion that any ¢ with c; + c¢ = 1 and | c.| S 4 is Chebyshev; 


ie., p = 1. Hence, k — p = 2, and indeed the design &* for which &*(—1) = 
&*(1) = 4 is optimum. The heart of the matter is that (1, 2°) is not a Chebyshev 
system and that is is possible to estimate a; optimally without estimating a, 
at all. 

Example 3.3. An example where the optimum &* is not unique. There are many 
obvious examples of this kind, as we have indicated in the paragraph following 
the proof of Theorem 3. For example, one simple example is given by X = 
[—1, 1], & = 2, f(x) = 1, f(x) = 1 + sin 10 x (any — which assigns measure 
§ to each of the sets where sin 10 x = 1 or —1 satisfies (2.10)); an even more 
trivial one is k = 1, fi(x) = 1, where every strategy is optimum. 

Example 3.4. An example where a nonoptimum & may be preferable. This example 
illustrates the phenomenon alluded to in the text, wherein a design & which is 
not optimum in the sense defined in Section 2 may be preferable to an optimum 
design £* for use (e.g., in testing a hypothesis about a2) because the latter yields 
one less degree of freedom for the estimate of o°. Let ¢ be a fixed small positive 
number, and suppose that 9% consists of the three integers 0, 1, and 2, that k = 2, 
and that f;(2) = 2° and fo(x) = 1 + (1 + e)a. It is easily computed that the 
Chebyshev c* is 1 + 3€/5, that B(c*) consists of the points 1 and 2, that m(c*) = 
1 + 2¢/5, and that the optimum &* is given by &*(1) = 1 — &*(2) = 4/5. Thus, 
the efficiency of the design which takes all observations at « = 0(£(0) = 1) and 
estimates a2 in the obvious way, is (1 + 2¢/5)~*; when e is small, this is more 
than offset by the extra degree of freedom for estimating o° (e.g., 4 for the latter 
design against 3 for —*, when 5 observations are taken), for the problem of test- 
ing a hypothesis about a, or giving a confidence interval on a2. 

Example 3.5. A multidimensional example. Let X be the set of all points (2; , 22) 
in the Euclidean plane for which | x; | S 1 and | z2| S$ 1. Let k = 6 and suppose 
that the functions f; are, in order, 1, 2 , 22, 21, 22, and 222 ; thus, for example, 
we may be testing the hypothesis that a quadratic function of two variables has 
no interaction term dgt;22, i.e., that ag = 0. An easy approach to obtaining an 
optimum £ is the third method mentioned in Section 2: An obvious guess of a 
which might be optimum is that measure ¢’ (say) which assigns probability } to 
each corner of the square 9. Thus, writing c, + cs + ¢; = @, we see that K(&’, c) 
is symmetric in each of the variables cz , cs; , and é (which are the only quantities 


on which it depends), so that min.K(é’,c) = K(é,c’) = 1 is attained for any 
c’ for which c. = c; = @ = 0. Let ce” have all five of its components equal to zero. 
Then, clearly, max,K(2z, c”) = 1. Thus, by the discussion following Theorem 2, 


we have proved that £’ is optimum. Another way of verifying the optimality of 





OPTIMUM REGRESSION DESIGNS 285 


’ is to note that, in the terminology of the remark on symmetry of Section 2, 
G is the group of symmetries of the square, and an analogue of the last argument 
mentioned there for the case of polynomial regression with A odd obtains £’ from 
the optimum design r* which assigns mass 1 to (1, 1) for the problem of esti- 
mating dg on 0 S x S y S 1 when the regression function is agry. We note that 
only a2, a3, a, and a; + as + as are estimable for this design. The fact that 
only four linearly independent estimable linear parametric functions exist here is 
reflected in the fact that, in the notation of Section 2, p = 2. This can be seen 
by noting that, if c’ = (e« + 6, 0,0, —«, —8), where ¢ and 6 are sufficiently small, 
then max,K (x, c’) is still equal to unity, so c’ is Chebyshev. 

Other examples. Many other examples of optimum designs can be obtained 
from the extensive literature on Chebyshev approximation problems. For ex- 
ample, Section 37 of [2] can be used to obtain such a design for the setup of 
Example 3.1 wherein f; is altered to fi(2) = 1/(a — c) withe > b. 


4. The case of several regression coefficients. We consider now the setup 
of (2.1)-(2.3) (see also (2.18) ) in the case where we are interested in inference 
about more than one of the a;. In some estimation problems, a treatment like 
that of Section 5, wherein the behavior of the function dail, rather than that 
of the a; themselves is considered, will seem appropriate. However, in most prob- 
lems of testing hypotheses, as well as in many problems of estimation (especially 
where the inference is not about all of the a;), the treatment of the present 
section may seem appropriate. 

We must first choose a criterion of optimality of a design for a problem of 
estimation or testing hypotheses about s of the a; , say @:.4:, -*- , a . Of course, 
it is easy to specify a loss function and a criterion (minimax, etc.) for choosing a 
design and associated decision procedure; but, as shown in [1], such a simple 
criterion as that of maximizing the minimum power of a test on an appropriate 
contour (M-optimality) will usually lead to most unwieldy computations. Even 
the corresponding local criterion on the power near the null hypothesis (L-opti- 
mality) will lead to difficult computations. Two other criteria considered in [1] 
are D-optimality and E-optimality. In the present setting, n being fixed, a 
design d* is said to be D-optimum if ay_,4:, -+- , a are all estimable under d* 
and if, among all designs for which these parameters are estimable, denoting by 
o V4 the covariance matrix of the b.Le.’s of these parameters when design d is 
used, det V4 is a minimum for d = d*. A design is said to be E-optimum in the 
above setting if the maximum eigenvalue of V4 is a minimum for d = d*. The 
relevance of these criteria for problems of testing hypotheses and of estimation 
was indicated in [1] and the reference cited there. It was shown that D-optimality 
is generally more meaningful. There is an additional reason why this is so in 
problems of the type considered here: Consider the polynomial setup of Example 
3.1 for any value k > 2(h > 1) and s > 1. It is clear that the change of scale 
x’ = hx does not leave invariant the criterion of E-optimality: a change in the 
scale of measurement can change the E-optimum design. This is unsatisfactory 
from both an intuitive point of view (the optimum design depends on the choice 





286 J. KIEFER AND J. WOLFOWITZ 


of a unit of scale) and from a practical one; one would have to table optimum 
designs in such problems, as a function of a, 8. (A similar remark, of course, 
applies to L-optimum and M-optimum designs. ) On the other hand, D-optimality 
is invariant under such transformations. The same result is true under a change 
of origin (or a change of both scale and origin) in this polynomial example: 
D-optimality is invariant, but Z-optimality is not. 

Thus, although D-optimality is not an appropriate criterion in all problems, for 
the reasons given in the previous paragraph it seems reasonable to investigate 
this criterion as a first attack on the problem of finding optimum designs. We 
shall thus develop a method for obtaining D-optimum designs in the remainder 
of this section, except that we shall indicate briefly at the end of this section how 
various other criteria can be treated similarly. 

Proceeding as in Section 2, let h,; be numbers such that, fori S k — s < t, 


the functions f; are orthogonal to the functions f; = f, — Sut hif ; in the sense 
of (2.5), i.e., 


(4.1) > filaa)ft (ar) = 0, isk—s<t. 
r=l 


Then, as in the discussion of (2.6), we see that o° times the inverse of the covari- 
ance matrix o V, of best linear estimators of a:.4:, °°: , @ has elements 
> ft (ar) fi (ar), kK -— 8 < i,j Sk. Fort >k —s, let fe* =ft — do, <gift be 
orthogonal to f; fork — s <j < t. Since the linear transformation which takes 
the f; into the f;*, k — s < t < k, has determinant 1, and since DSi (a) 
fi*(a,) = Oifk — s <i < j, we obtain 


(4.2) det Vz' = dl Af * (2). 


Now, f-” is clearly f; minus the projection of f; on the linear space spanned by 
fi, fe, +++, fia. Thus, the ith term in the product of (4.2) is just the expression 
of (2.7) with k replaced by 7. Finally, then, making the same approximation as 
in Section 2 regarding the representation of the class of all designs by the class 
of all probability measures £ on 9X, we have demonstrated, to x ithin this approxi- 
mation, the validity of the following definition, wherein c” denotes a vector 
(ci, --- , c§21) of j — 1 components: 

DEFINITION. A measure §* in = is said to be D-optimum (for the parameters 
Ak-s+15 °°" 9 a) af 


min itze — Site! f(a) PE* (dx) 


J, e(7) 


(4.3) ‘ 


7-1 
= max [[ min [isc — Yel f(x) PE (dx) 
1 


fel f>k—s cl?) 


Of course, (4.3) reduces to (2.8) in the case s = 1. When f; is a constant, a & 
which is optimum for s = k — 1 is also optimum for s = k. 





OPTIMUM REGRESSION DESIGNS 287 


We note that it is a consequence of this definition that an optimum design is 
optimum for all values of o’. 

lor the special case where s = k and & consists of k points, it is easy to prove 
that the unique optimum £ puts mass 1/k on each point. For if A is the matrix 
whose (7, 7) element is f;(x;) and B is the diagonal matrix with £(z;) the diagonal 
element in the jth row, an optimum design maximizes det(ABA’) = (det A)*det B. 
This argument has been employed by Hoel in the problem considered by him; 
see Example 4 below. 

The methods of Section 2 do not directly yield anything here for the general 
problem. The analogue of Theorem 1 is essentially empty, since the various 
B(c)’s for c’? Chebyshev will not in general coincide. The game-theoretic 
approach is inapplicable because the product on the left side of (4.3) is not 
linear in £*; moreover, the product of the integrals (before minimizing over the 
ce”) is not convex in the c”’s, since u’v” is not a convex function of u and v. The 
following analysis will, however, yield a method for obtaining an optimum £. 

For 7 > k — s, let 


7-1 
(4.4) F;(¢) = min [u@ - 2, ef f(x) (dx). 


In s-dimensional Euclidean space R’, let S be the set of all points 
F(é) = (Fresas (€), «+> , Fe(€)) for & in Z. Although S may not be convex, it 
possesses the following “upper convexity” property, which is all we require: For 
any & and & in = and any A with 0 < A < 1, 


(4.5) F (Ah + (1 — Ad&) S AP (CEL) + (1 — ADF 5(E2) 


for all j > k — s. In fact, (4.5) is an immediate consequence of the linearity in 
£ of the integral of (4.4). 

Let Ux.41, °** , Ue be the coordinate functions of R’. For 6 > 0, let G; be the 
set of all points in R’ with all coordinates positive and [] ju; = 4. Let Gi be the 
subset of G; where IL; = §. We note that G; is convex. Suppose that S is 
closed (this is easily proved from (4.4) if @ is large enough so that = is compact; 
the modification which is needed if = is not closed is trivial, anyway), and let 
5) be the largest value of 6 such that G; and S have a nonempty intersection. 
(Such a do exists since S has points with all coordinates positive.) If T is the 
convex hull of S, property (4.5) implies that 5 is also the largest value of 6 such 
that G; and T have a nonempty intersection. Hence, applying the separation 
theorem for G;, and 7’, we conclude that there is a hyperplane LZ with positive 
direction cosines such that L separates G;, and S. Thus, any point F(£*) in 
Gs, A S clearly maximizes ILF; (€) (i.e., that &* satisfies (4.3) ); and, for posi- 
tive numbers he for which L is given by >av,U; = constant, that point maxi- 
mizes SF, (£). Finally, since all points of Gi; are extreme, L intersects G;, in 
exactly one point, as does therefore S. 

Before summarizing the above results, we note that, for X = (Ag—e41, ++ * 5 Ae) 





288 J. KIEFER AND J. WOLFOWITZ 


with all A; > 0, the payoff function 


(4.6) Kx(2, 0) = Dalfile) — Do” flay? 
j>k—s i 
where c = (c“~**”, --- , ec”), satisfies all of those conditions satisfied by the 


function K of Section 2 which were used in the proof of the game-theoretic 
results of the lemma there. Thus, that lemma is valid when K is replaced by K).* 
The function K), is of course no longer in a form suitable to make use of Cheby- 
shev approximation results. However, for any \, if c is minimax for the payoff 
function Ky, we can still characterize maximin £,’s in terms of the set By(cx ) 
(say), defined to be the set of x for which K,(x, co.) achieves its maximum. 
With this interpretation of symbols, the analogue of (2.10) is proved here ex- 
actly as in (2.11). 

We have thus proved that following,’ where C now stands for the set of vectors 
ce = (c***), --- ec”) and >! = {c}’”*} stands for a vector of this type: 

THEOREM 4. The game of = -C with payoff function Ky is determined. If 
is maximal with respe ct to cn tohile cx is minimal with respect to & , then & is maxi- 
min. Thus, if cr is minimax and 


(4.8) §( By (ex )) 
and 
(4.9) [is — > ef? fi(x)] f(z) & (dx) = 
t<? 
fori <jandk —s <j Sk, then & is maximin; moreover, every maximin £) 


satisfies (4.8) and (4.9) for every minimax crn . There is, to within a m ultiplicative 
constant, a unique value d* of d such that []F\(,) is a maximum for \ = d* 
and some &)+. Those &y+ which maximize IL Fil), and no other ¢’s, are optimum. 
F (+) ts the same for any optimum &)«. 

We now consider an example. 

Example 4. Consider the setup of Example 3.1, where (see the end of the 
second paragraph of the present oe we may suppose a = —1l, 6 = 1. 
Suppose k = 3 (h = 2), and s = 2; as we have remarked earlier, the optimum 
design obtained below will also obviously be D-optimum for the case s = 3. An 


‘ That part ot the lemma which concerns the number k — p is valid when k is replaced 
by 1 + 8(2k — s — 1)/2 (= 1 + number of components of c) in the statement of the lemma. 
However, this is of no use to us since it may be that no maximin strategy on the specified 
number of points is optimum. For example, in the set-up of Example 5.2 below with s = k 
= 2, one can verify that the \* of Theorem 4 is (15/4, 1), and that any &* with first and 
second components equal is maximin, but only (4/15, 4/15, 7/15) is optimum. 

It is trivial that the optimum strategy need be concentrated on no more than 
1 + k(k + 1)/2 points. For the criterion of optimality (4.3) involves — only through the 
elements (5.2) below of the matrix M(£). These matrices form a convex body of dimensional] - 
ity at most k(k + 1)/2, spanned by matrices of £’s concentrated on a single point Hence 
any M(é) is a linear convex combination of at most 1 + k(k + 1)/2 extreme elements. 

5 See also footnote 6. 





OPTIMUM REGRESSION DESIGNS 289 


elegant solution to this problem for general k and s = k, has been given by P. G. 
Hoel [15] (see also Example 5.1 below). The case s < k — 1 does not seem to 
yield to his attack. The present problem is discussed here as an illustration of 
our methods. We may take 1 and y for the components of \, and write K’, (z,d) = 
(2 — d)* + y(2*° — dix — do)’ in place of K, . For fixed y, one may guess that 
there will be a maximin strategy f, of the form £5 ( -—1)= £,(1) = ay, £(0) = 
1 — 2a,, for some a,. With respect to such a £, , the minimal strategy (which 
must merely satisfy the orthogonality relation (4.9)) is obviously do = di = 0, 
do = 2a, . Forthischoiced, (say) of d, we obtain K, (£,,d,) = y[2a, — 4a] + 
2a, . This is maximized by a, = min (3, (y +1)/4y), and for the strategy £,- 
corresponding to this value of a, we obtain 


((y + 1)*/4y if y>1, 


(4.10) min K.,(¢* ,d) = : 
d \1, if ys 1. 


On the other hand, 


(4.11) min max K,(z, d) = max K,(2, d,). 
d z z 


Since K;,(z, d.,,) is convex in 2”, its maximum is attained at either x” = 0 or 2” = 
1, and an easy computation shows that the right side of (4.11) is in fact equal to 
the right side of (4.10). Thus, we have proved that £5 is maximin. Finally, 
F.(&% )F3(#% ) = 4a5(1 — 2a,), which is maximized by a, = }. Thus, an opti- 
mum design for this problem is §(—1) = &(0) = &(1) = 4. Of course the opti- 
mum designs for a given set of f; will depend on s, as exemplified by the different 
results obtained in Example 3.1 and Example 4. 

We shall now mention briefly methods for obtaining designs which are opti- 
mum in two other senses. Although it is not difficult to characterize E-optimum 
procedures in simple examples, they often seem much harder to calculate than 
D-optimum ones. Somewhat easier is the characterization of that design which 
minimizes the maximum eigenvalue of the covariance matrix of best linear esti- 
mators of the regression coefficients of the f7* (the regression function being 
expressed in terms of the f, fort < k — s and of the fi" fort > k — 8); i.e., of 
LaV ala where Lz is a square matrix with ones on the main diagonal and zeros 
above it. (The f;* depend on the design, which indicates the intuitive weakness 
of this criterion; however, as pointed out in [1], the criterion of E-optimality, 
which has often been considered in the literature, suffers from a similar short- 
coming.) Again making the approximation that we do not restrict nf to be 
integer-valued, this criterion amounts to finding that which maximizes min x-_, 
F ;(&), i.e., if & is the largest value of 6 for which the orthant H; = {minju; 2 4} 
intersects S nonvacuously, those ¢ for which F(£) is in Hy 1M S are the optimum 
procedures with respect to this criterion. Another criterion which has been con- 
sidered in the literature, especially in estimation problems, is that of minimizing 
the “average variance”, o's trace (V_). Defining F}(£’) to be the expression 
of (4.4) with the sum in the integrand taken only from 1 to k — s, this criterion 





290 J. KIEFER AND J. WOLFOWITZ 


amounts to minimizing )_ ;>.—. F} (¢). Replacing S by the set of points F*(£) = 
(Fi ek tds LP Al, Fr (é) ), and restricting the sum over 7 in (4.6) to values < 
k — s, this amounts to finding the maximin é’s for a \ with all components equal. 
These maximin ¢’s for the original S and K, (with all A; equal) would of course 
minimize the average variance of the b.l.e.’s of regression coefficients of the 
ft"; ie., would minimize the trace of LaV ala . Criteria like that of minimizing 
the average variance are subject to the same criticisms as E-optimality. 

Remarks. As in the problem of Section 2, one can prove that the symmetry 
condition (2.15) and the obvious analogue of the condition of the line above 
(2.15) imply the existence of a symmetrical optimum £ for any of the criteria 
considered in the present section. For example, from (4.5) it follows at once 
that, if is D-optimum, then the symmetrical = defined by (2.17) is also D-opti- 
mum. Remarks analogous to those of Section 2 on the number of points at which 
a symmetrical optimum £ will be concentrated, clearly hold in the problems of 
this section. We note that the choice of the form of £, in Example 4 is motivated 
by symmetry considerations, although the optimum weights must be computed 
in any approach. 

The remark concerning the modification of (2.18) applies also to the problems 
of this section. 


5. Estimation of the whole regression function. In the setup described by 
(2.1-(2.3), suppose the problem is one of estimation concerning all the a; . One 
approach has been indicated in Section 4. Another approach is to think of the 
problem not as one of estimating the parameters a; , but rather as one of esti- 
mating the entire function >>a,f;. Thus, if g is the estimate of -asf;, it is 
desired to make some measure of the average deviation of g from > a,f; small 
in some sense, by choosing an appropriate design. The most obvious possibilities 
of such measures are perhaps (1) supa#W (sup, | g(x) — > aif; (x) | ), where 
W is nondecreasing; (2) the integral with respect to some measure uw on X of 
supaEW (\g(z) — Dias; (x)|); (3) the supremum on & and @ of 
EW(| g(x) — Dcasf; (x) | ). Of these three possibilities, the first is perhaps the 
most meaningful for most applications (with perhaps the inclusion of a weight 
function A(x) multiplying | g(z) — > a,f; (x) | ) but is computationally much 
more difficult to treat than the others; the second possibility is by far the easiest 
computationally, but is least satisfactory from a practical point of view because 
of the necessity of choosing .—for example, if X is a line segment, the optimum 
design will not be invariant under homeomorphisms of &X, if » is always chosen 
to be Lebesgue measure; the third possibility is a compromise between the first 
two and, as a first attack on the problem, is what we consider in this section, 
with W(t) = ¢. We note that a remark of [9, p. 215] indicates that Box and 
Hunter are considering the second approach for certain polynomial multiple 
regression problems when W(t) = ¢ and y» is Lebesgue measure on a Euclidean 
set. We note that it is a consequence of all three definitions of optimality dis- 
cussed in this paragraph that an optimum design is optimum for all values of o”. 








OPTIMUM REGRESSION DESIGNS 291 


We shall not have to concern ourselves here with the choice of the function g: 
for example, the remarks of Section 2 extend here to show that, for a given de- 
sign, if 4 is the b.Le. of a, then supa,2Kal )-a,f; (2) — g(x))° is a minimum for 
g(x) = >-4f; (x). We therefore assume this choice of g in what follows. Thus, 
we are led to consider the minimization with respect to the design d of the 
expression 


(5.1) max E{>- (4; — a,)f<x)P = o& max f(x)'Vaf(zx), 


where we have written f(2) for the vector of f;(x)’s. Using again the representa- 
tion of a design as a measure £, the analogue of Va is the inverse of the matrix 
M(&) whose (7, 7)th element is 


(5.2) mij(£) = [ S@)f(x) (a2). 


Thus, making an approximation like that of Section 2 in not requiring né to be 
integral, we define a design £* to be optimum for the problem of this section if 
M(é*) is nonsingular and 


(5.3) max f(x)’M(£*)"f(x) = _ max f(ax)’M(£) f(x). 

It seems more difficult here than in Section 2 to give a useful general comput- 
ing algorithm. We now describe one device which seems useful in many examples. 
Let D; be a non-singular matrix such that the vector g: = D:f consists of func- 
tions g;:,, which are orthonormal with respect to £; it is clear that such a D; 
exists for any £ in = for which M(£) is non-singular. Since the (7, 7)th element of 
D:M(é )D; is the integral with respect to é of g:,.g:,; , we obtain 


(5.4) f(x)'M(E) f(x) = ge(x)(DeM(E)De) “ge(2) = Yi ges (xP 


(Since the left side of (5.4) does not depend on D; , neither does the right side; 
thus, in searching for a — which minimizes the maximum with respect to x of 
(5.4), it suffices to consider for each — only that D; and g; which are computa- 
tionally most convenient.) Since the g;,,’s are orthonormal with respect to &, 
the integral with respect to — of the last expression of (5.4) is k, and this cannot 
be greater than the maximum with respect to z of (5.4). Thus, a sufficient con- 
dition for a given £ to be an optimum design is 


max > [g¢.(x)) = 
z i=l 


Of course, a necessary condition for (5.5) to be satisfied is that — give measure 
one to the set of z where >-[g;,; (x)]’ = k, and it is useful to keep this in mind 
in examples. 

Suppose (5.5) is satisfied for a &’ concentrated on k points, say 21, --- , 2% 
Then the k X k matrix whose (7, 7)th element is g;’ (ait (x;)]' has orthonorm: ' 
rows and, hence, orthonormal columns: DY doe ix, )F é = lforl Sj sk. 





292 J. KIEFER AND J. WOLFOWITZ 


Hence, fj > 0, and by (5.5) fj = 1/k. Hence, each £; is 1/k. We summarize 
our results. 

Tueorem 5. If (5.5) holds, thené is optimum.® If (5.5) holds fora & concentrated 
on k points, then & gives measure 1/k to each of these points. 

If the setup is that of Example 3.1 it follows from the results of [10] cited in 
Section 2 that there exists an optimum £ concentrated on exactly k points. This 
will not be true in general (see Example 5.2 below). 

In the special case where X consists of k points, the argument of the paragraph 
preceding the present theorem, applied to the distribution §£; = 1/k,7 = 1, --- ,k, 
shows that (5.5) is satisfied for this distribution, and hence the latter is optimum. 
Combining this with a remark which follows (4.3) we conclude that, when 
consists of k points, the design which puts mass 1/k on each point is the unique 
optimum design according to both the definition (4.3) for s = k (the problem 
of Section 4) and the definition (5.3) (the problem of the present section). 

Example 5.1. The setup of Example 3.1. It is possible to solve this problem by 
our methods and such a solution was given in the original draft of this paper. 
In the meantime, however, a solution has been published by Guest [13], so that 
there is no point to repeating the details of our solution. An earlier discussion by 
Smith [14] gave details of designs up to k = 7. The optimum design assigns mass 
1/k to the points +1, —1, and the roots of Li(2x ) = 0, where Ly is the derivative 
of the Legendre polynomial. (5.5) is satisfied ({13], equation (10) ). It therefore 
follows from Theorem 6 below that this design is also optimum in the sense of 
definition (4.3) for s = k (problem of Section 4) for this setup; i.e., a special case 
of Theorem 6 asserts that Hoel’s design [15] is the same as that of Guest [13].’ 
This last fact was noted by Hoel through an examination of the explicit results 
in the polynomial case. 

Example 5.2. This example illustrates the use of Theorem 5 where the opti- 
mum £ is concentrated on more than k points and does not give equal measure to 
all of them. Let k = 2 and let X consist of three points. Thus, we hereafter write 
the f; and £ and S as triples, where S(x) = Di (g¢.:(x) |’. Suppose f; = (1, 1, 0) 
and fe = (0, 1, 2). For & = (&, &, &), we obtain easily 


S = (figs + 4Eot; + 48:f3) "(£2 + 4s, & + 4ts, 461 + 42) 


We have >: S; = 2, identically in ¢. Suppose | = 0, & > 0, & > 0. Then either 
1) S. = S;, in which case & = & = 4 and S; > S; = 2,or2) max (S:2, S3) > 2. 
Thus, in either case max,;S; > 2. A similar argument applies if either of the other 
£,’s is 0, and two &,’s can obviously not be 0. Thus, max;S; can be 2 only if all é; 
are positive and all S; are equal to 2. The unique optimum ¢ is thus easily seen 
to be (4/15, 4/15, 7/15). 


® The converse of this statement is true. In fact, it will be proved in a subsequent paper 
(the results were obtained too late for inclusion in the present paper) that the following 
three statements are equivalent: (a) the design ¢ is optimum in the sense of Section 4 with 
s = k; (b) the design ¢ is optimum in the sense of Section 5; (c) the design £ satisfies (5.5). 
7 This is a special case (for polynomial regression) of the result described in footnote 6. 





OPTIMUM REGRESSION DESIGNS 293 


_ 


It is obvious how to give examples like those of Section 3 where the optimum 
£ is not unique, etc. 

The argument just after (2.17) is easily modified to apply to the expression on 
the left side of (5.1), so that we can again conclude that there exists an opti- 
mum symmetrical £ if (2.15) and the obvious analogue of the condition of the 
line above (2.15) hold. The question of the number of points at which an opti- 
mum symmetrical £ will be concentrated is difficult, as is the corresponding 
question for general optimum &. 

The modification of (2.18) can be made in the problem of this section, exactly 
as in Section 2. 

We shall conclude this section with a result which sheds some light on the 
connection between the problem of Section 4 for s = k and the problem of the 
present section. This result has already been cited in Example 5.1. 

THEroreM 6.5 If the design which puts mass 1/k on each of k points satisfies 
(5.5) and is optimum in the sense of (5.3) (problem of Section 5), then this design 
is also optimum in the sense of (4.3) (problem of Section 4) with s = k. 

Proor: Let & be a design, optimum in the sense of (5.3), such that £ assigns 
mass 1/k to each of the points 2, , --- , x, in &, and such that (5.5) is satisfied 
for — = &. Since a design optimum for the problem of Section 4 with s = k 
is invariant under a linear transformation on the f; , it will suffice to prove that 


£ is optimum for this problem assuming f; = g:,,; ; henceforth we make this 
assumption. Thus 


(5.6) max >, fi(x) 
z : 
and 


mi;(to) = 6:;, 


and we have to prove that £ maximizes det M(£). Now from (5.6) for any & 
we have 


> mi(é) + 


and hence 


det M(t) = Tl mi(t) S 1 = det M(£) 


This proves the theorem. 
The authors are obliged to Professor G. Elfving for helpful comments. 
REFERENCES 


{1} J. Kierer, ‘On the nonrandomized optimality and randomized nonoptimality of 
symmetrical designs’’, Ann. Math. Stat., Vol. 29 (1958), pp. 675-699. 
{2} N. I. Acnreser, Theory of Approximation, Ungar Pub. Co., New York, 1956. 


* Theorem 6 is a very special case of the results announced in footnote 6. 





294 J. KIEFER AND J. WOLFOWITZ 


[3] G. Evrvine, “Optimum allocation in linear regression theory’’, Ann. Math. Siat., 
Vol. 23 (1952), pp. 255-262. 

[4] G. Evrvina, ‘“‘Geometric allocation theory’’, Skand. Akt. 1955, pp. 170-190. 

[5] H. Cuernorr, ‘Locally optimum designs for estimating parameter’’ Ann. Math. Stat., 

Vol. 24 (1953), pp. 586-602. 

(6] H. F. Bounensuust, 8. Karin, ano L. Suapiey, “Games with continuous, convex 
payoff’, Ann. Math. Studies, No. 24, pp. 181-192. 

(7) J. Kierer, ‘Invariance, minimax sequential estimation, and continuous time proc 
esses’’, Ann. Math. Stat., Vol. 28 (1957), pp. 573-601. 

[8] R. A. Fisuer, Statistical Methods for Research Workers, tenth edition, Oliver and Boyd, 
Edinburg, 1946. 

(9] G. E. P. Box anv J. S. Hunter, ‘‘Multi-factor experimental designs for exploring 
response surfaces’, Ann. Math. Stat. Vol. 28 (1957), pp. 195-241. 

[10] J. A. SHonart anp J. D. Tamarxin, The Problems of Moments, Math. Surveys, No. 1, 
Amer. Math. Soc., New York, 1943. 

{11} E. J. Witi1ams, “Optimum allocation for estimation of polynomial regression,” (ab- 
stract) Biometrics, Vol. 14 (1958), p. 573. 

{12] S. Enrenrevp, ‘‘Complete class theorems in experimental design’’, Proceedings of the 
Third Berkeley Symposium on Mathematical Statistics and Probability, University 
of California Press, 1955. 

{13] P. G. Gugst, ‘‘The spacing of observations in polynomial regression’’, Ann. Math. 
Stat., Vol. 29 (1958), pp. 294-299. 

{14] K. Smirx, “On the standard deviations of adjusted and interpolated values of an 
observed polynomial function and its constants and the guidance they give 
towards a proper choice of the distribution of observations’’, Biometrika, Vol. 
12 (1918), pp. 1-85. 

{15} P. G. Host, “Efficiency problems in polynomial estimation’’, Ann. Math. Stat., Vol. 
29 (1958), pp. 1134-46. 





SOME OPTIMUM WEIGHING DESIGNS 


By DamarasJu RAGHAVARAO 
University of Bombay 
1. Introduction and summary. Suppose we are given N objects to be weighed 
in N weighings with a chemical balance having no bias. Let 
xi; = 1 if the jth object is placed in the left pan in the ith weighing, 
= —1 if the jth object is placed in the right pan in the ith weighing, 
= 0 if the jth object is not weighed in the ith weighing. 
The Nth order matrix X = (2;;) is known as the design matrix. Also let y; 
be the result recorded in the ith weighing, e; be the error in this result and w; 
be the true weight of the jth object, so that we have the N equations 


LiW, + LigWe + +--+ + VinWw = Yit Gi, t=1,---, N. 


We assume X to be a non-singular matrix. The method of Least Squares or 
theory of Linear Estimation gives the estimated weights (%#;) by the equation 


a riw\-lwiv 
m= (X'X) X’Y, 
where Y is the column vector of the observations and @ is the column vector of 
the estimated weights. 
If o is the variance of each weighing, then 


Var (w) = (X’X)™ 2 = (¢,;)o’, 


where (c¢;;) is the inverse matrix of (X’X). 

An expository article reviewing the work done in weighing designs is given by 
Banerjee [2]. 

Kishen [4] treats the reciprocal of the increase in variance resulting from the 
adoption of any design other than the most efficient design, with mean v-~'- 
ance o/N, as the efficiency of the design. This efficiency can be measn™ 4 by 


Mood [5] gives an alternative definition for the best weighing design. In his 
view the best weighing design should give the smallest confidence region in the 
(i = 1,--- , N) space for the estimates of the weights. Hence a design will 
be called best if the determinant of the matrix (c;;) is minimised. 

In this paper we follow Kishen’s definition in obtaining the best weighing 
designs. 

Hotelling [3] proved that for the best weighing design ¢;; = 1/N and c;; = 0 
(¢ ~ j). The weighing designs for which c;; = 1/N and c;; = 0 are best in the 
sense of both Kishen and Mood. Later Mood proved that the above property is 


Received October 24, 1958; revised December 26, 1958. 


295 





296 DAMARAJU RAGHAVARAO 


satisfied by Hadamard matrices. Plackett and Burman [6] have constructed 
Hadamard matrices, Hy , up to and including N = 100, excepting N = 92. 
It may be remarked here that a necessary condition for the existence of Hy 
is N = 0 (mod 4), with the exception of N = 2. It is not known whether this 
condition is sufficient or not. 

In this paper, the best weighing designs are obtained in the cases (i) N is odd 
and (ii) N = 2 (mod 4) subject to the conditions: 

i) The variances of the estimated weights are equal; and 

ii) The estimated weights are equally correlated. 

The 2nd condition here is the same as that of Banerjee [1]. 


2. Some theorems relating to the best weighing designs. With the condi- 
tions made above (X’X) matrix takes the form 


rom ee 8 
(2.1) eee 

[ore 
Now 

det (X’X) = {det (X)}’, 
= (r— r)* {r+ X(N — 1)}. 

Therefore, 
(2.2) det (X) = #(r — vA) ir + X(N — 1)}?. 


The det (X) is real and not equal to zero. Hence we have 


(2.3) r>r 
and 
(2.4) r+XN —1)>0. 
Relation (2.4) holds good when \ is non negative and \ = —1. In the latter 


case r = N. 
Therefore, in this paper, we consider only the values of r and X satisfying 


r>aA20, orr = N, A= —-1. 


When the matrix (X’X) is of the form (2.1), the variance of the estimated 
weight is 


(2.5) ir + MN — 2) jo" 


Therefore, the efficiency of the weighing design is 


— f r — 
(2.6) =~ Mir + MAA = BI f(r,d), say. 





Nir + X(N — 2)} 





OPTIMUM WEIGHING DESIGNS 297 


LEMMA 2.1. 


i) Letr = N. Then cannot be even (including zero) when N is odd and \ can- 
not be odd (including —1) when N is even. 


ii) Let r = N — 1. Then Xd cannot be even (including zero) when N is odd and 
d cannot be odd when N is even. 

Proor. Let x; and x; be any two column vectors of the design matrix X. 

i) When r = N, UX; will have N terms each term being either +1 or —1. 
Since aie; = \, amongst the N terms {N — |A\} terms sum to zero. Hence N 
and |\| should either be odd or even and the statement follows. 

ii) Whenr = N — 1, Ux; will have N terms each term being +1 or —1 or 0. 
Since Ut; = \, amongst the N terms (N — X) terms sum to zero. If N is odd 
and ) is even (N — }) will become odd and the (N — \) terms cannot sum to 
zero unless there is a single zero term. x; and x; will contribute a single zero term 
to x,2; when and only when the zeros of x; and x; are in the same row. This is 
also the case for any two columns of X. Hence, if N is odd and d is even, we get 
a row of zeros in X, and in this case det (X) = 0, contrary to our assumption. 
Therefore, \ cannot be even when N is odd. 

Similarly we can show that \ cannot be odd when N is even. 


THEOREM 2.1. When N is odd the best weighing design X is that for which 


PROOF. 
f(N,1) — flr, ») 


_2N—1_ (r—A)ir+XA(N —N r, \ are positive, r > X, 
ON  N{(r+XN — at orr = N,\ = —1. 


_ (2N — 1)(N — 2)d + (2N — 1)r — 2r° + 2rd — Ar — dv)(N — Dr 
- 2Nir + X(N — 2)} 


_ (N — 1)A(2N — 2r + 2A — 1) + (2N — 2r — L)(r _ d) >0 
i. 2N{r + X(N — 2)} 
when r < N. 


ae s (A — 1)(N — 2) ms 
(2.8) 1.) “teh « L)(A ° x 


Nir EN = 


when r = N. (2.8) is again greater than zero for all values of \ excepting zero in 


which case it is less than zero. But Lemma 2.1 proves that \ cannot be zero 
since N is odd. Hence the best weighing design in this case has efficiency f(N, 1) 
The one is thus proved. 


THEOREM 2.2. When N = 2 (mod 4) the best weighing design X is that for which 


X'X = diag {(N — 1), (N , N terms}. 





298 DAMARAJU RAGHAVARAO 


PROOF. 


. (N — 1) (r— A) {fr + ACN — 1)} 
(W — 1,0) — flr,) = — Sa 
f Ree Nes N Nir +X(N — 2)} 
r,\ are positive,r > A,orr = N,A = — 1. 


r(N —1) + (N —1)(N — 2)\ — r(r —X) 
me oie ie. — 1)(r—A)aA 
Nie+ian-s..lOmC~«wC~” 


r(N —r+rA—1)4+A(N —- 1) 
(N—r +A — 2) 





? Nir xv = 2) ee 
when r < N. 
, N(A — 1) + (CN — 1)(A — 2)A 
(2.9 W¥ — 1,0) — flr,r) = ——— Rowe 
Br ie M2 Mop Nir + X(N — 2)] 
when r = N. (2.9) is greater than zero when \ 2 2 and \ = —1 and it is less 


than zero when \ = O or 1. 
Lemma 2.1 proves that \ = 1 cannot exist and it is known that the Hadamard 
matrix cannot exist in this case and A cannot be zero. 
Therefore, the best weighing design in this case has efficiency f(N — 1, 0). 
The theorem is hence proved. 


3. Py matrices. 
DEFINITION 3.1. A Py matrix is an Nth order matrix with elemenis +1 and 
—1 such that 


, Tr , 
PyPy = (N — 1)In + Enn, 

where Iy is the identity matrix of order N and Eyy is an Nth order matrix with 
positive unit elements everywhere. 

It is obvious from theorem 2.1 above that the Py matrix is the best weighing 
design whenever it exists and N must be odd. 

THEOREM 3.1. A necessary condition for the existence of Py is that 
f +1 

> 


N= 


where d is an odd integer. 

Proor. det {PxPx} = {det Py}? = (2N — 1)(N — 1)". Therefore det 
(Py) = (2N — 1)'(N — 1)”. Also since Py is a matrix with integral 
elements det (Py) is an integer. 

Hence (2N — 1) should be a perfect square. Let 


2N —_ 1 = ad 





OPTIMUM WEIGHING DESIGNS 299 


Since N is an integer, d is an odd integer and thus the theorem is proved. 
THEoreM 3.2. Jf a Balanced Incomplete Block Design exists with parameters 


v* = b* = N r* = k* = (N + d)/2, A* = (N + 2d + 1)/4, 


? 


then, by changing the zeros into —\’s in the incidence matrix of the incomplete block 
design, we get a Py matrix. 

Proor. Let the column vectors of the incidence matrix after 0’s are changed 
to —1 be pi, po,-*:, Dw. 

The negative contribution to DP; = 2(r* — A*) = (N—1)/2 (4, j= 
1,2,---,N;t#j)). 

Therefore, the positive contribution to pip; = (N + 1)/2. Hence DP; = 
Thus the theorem is established. 


4. Sy matrices. Williamson [7] proved that when 
N =p’ +1, 


where p is an odd prime and h is a positive integer such that p" = 1 (mod 4), 
then a symmetric matrix Sy exists such that 


- = (N > a 1)Iy ’ 


where Jy is the Nth order identity matrix. In that case the Sy matrix can be 
taken as our best weighing design. The construction of the Sy matrices is based 
on Galois Fields and the Legendre function ¢, and it is discussed in detail in [7]. 


5. Numerical examples. Now we construct some designs that belong to the 
Py and Sy series. Among the designs given below, Ps is proved to be the best 
design by Mood. A similar type of S, is constructed by Banerjee [1] intuitively. 


1 l l 1 1 
l —| 1 l l 
.=| 1 1 —1 1 1 
l l l —1 l 
l 1 1 1 -1 
Variance of each estimated weight = 20°/9. 


Covariance of each pair of estimated weights = —o /36. 
Efficiency = 9/10. 


ce 1 

1 0 
1 1 
1 —} 1 
1 —1 =f 
1 1 | 


P 


Se = 


Variance of each estimated weight = o°/5. 
Covariance of each pair of estimated weights = 0. 
Efficiency = 5/6. 





300 DAMARAJU RAGHAVARAO 


- | | ! 1 | 1 1 1 ] 
] 0 ] —] ] —] ] —1] ] —| 
| 1 l 0 | —] l 1 —|] —1 —1] | 
] —| ] 0 —] ] —|] —1| l l 
gon BN ye gow aHing ok aaa: nog 1-1 
ar ee «end 1 | | 0 -!1 1 -l1 -1 
l l ] —] —] —1 0 ] —1 l 
] —| —] —|] ] l 1 0 —] 1 
bi gh) wis ~4 i4ep weEdiage «@ 1 
] —] —] ] —] —] ] ] ] 0 
Variance of each estimated weight = o°/9. 
Covariance of each pair of estimated weights = 0. 
Efficiency = 9/10. 
] l l l l ] 1 l l —] 1 l ] 
l 1 l —] l —1] ] ] 1 ] ] —1 l ] 
l l —] ] 1 —] ] 1 1 1 ] —1] 1 
1 1 l —1 —} 1 —1 1 ] ] ] 1 —] 
—1 l l l l —1} ] ] ] ] l 1 1 
I —] ] l 1 —] —] l —1] l l l l 
Py = | ] —]} ] l l —] —|] l l ] l l 
1 ] ] —] ] 1 1 —] —] ] —] 1 1 
1 l 1 1 —] ] 1 l 1 —] 1 —] ] 
1 1 1 l 1 1 1 1 ] —| —1 l —1 
—] l 1 ] l ] —1 1 1 ] —1 l ] 
1 —] ] ] ] 1 1 l ] l ] —] -1 
—1] 1 —] l ] ] l l —] 1 l l —1 
Variance of each estimated weight = 20°/25. 
Covariance of each pair of estimated weights = —o°/300. 
Efficiency = 25/26. 
0 1 1 l l 1 1 I 1 1 l l 1 1 | 
t “Bee ee ee ae ee ee eS ee 
] ] 0 l —] 1 l —-l1 —-1 l1 —] l l —1 
1 -1 ] 0 l —1 1 l —l] -]1 ] —] ] La 
l l l ] 0 l —] ] ] -1 -1 -1 -!1 ] 
] ] ] —] ] 0 l —] l l —-l1 -—-l -—-1 -1 
Su =] 1 1 l l —] 1 0 ] ] ] l -l1 —l —-1 
| 1 -1l ] ] l —|] ] 0 l —] i ] —-l1 —-1] 
1 -l -1 -!l ] l -] l 0 l —1] ] l —|] 
| 1 -l -! -] -] ] l —] ] 0 l —] ] ] 
l ] —] ] -] —] ] l l l 0 ] —| l 
| iva 1 -1 -1 1-1 1 tL 23 O07 2. = 
1 —] l l —] 1 -1 —-1 ] ] —] ] 0 l 
li 1 s. 20.4 tb —2'=1 -$-8 fd: 4 1 0] 
Variance of each estimated weight = o /13. 


Covariance of each pair of estimated weights = 0. 
Efficiency = 13/14. 





‘ST/LIT = Aouatyyq 
= S}4SIOM pa}VUItysa Jo sed Yova jo souBURADD 
‘LI/cO = WYFIOM pazvUINjsa yous jo souTLITA 


[— 
[— 


MN 
Z 
i 
A 
o 
z 
im 
Sg 
a 
e 
= 
5 
a 
£ 





RAGHAVARAO 


DAMARAJU 


302 


OLIT/s” 





0S/6F = AIUelIBY 
S}YFIAM payBUIyse JO Ated YOVE JO BUBIIBAOS 


'6¢/,9% = JYFIOM poyVUIyss YOUVE JO BOUBLIVA 
I I i= 2 ty l I I 
I ™ Fe I I Nea l ce . i? 
:% I l := I ry f l “= 
‘ I I oe t=. 9 I 
I l ~ I I 3" 
‘= I ~ I Ne l = l [> 
> f=, 2° I l [> '= I 
I = I [i 4° ee i I — 
I =, t= I l Sa l > {> 
~ I oa .8- 4° l te. f=} 
I ‘= 7 l y= tt I hes 
l ." I “ — I I jt 
= I “47 Il _— es 2g F* 
‘7 l ta I ~ $2 l Es,  5e 
7 34 I ing I [~ i7 I te 
I oe 2° fe I . >< e= 
I I I S 3 'o I [— |= 
[= A7:4> I 7 I [ I [* 
I ee ee See es Bet. t= 
‘~~ 3° l l Gon: he Bon: at I 
[~*~ = | = I l I [ I 
I I ye f= Ge ges I 
ee cee ee le eH I 
I I I I I I I> I 
2 eS Ae ee l I Il 


OPTIMUM WEIGHING DESIGNS 303 


Acknowledgement. My sincere thanks are due to Prof. M. C. Chakrabarti 
for suggesting the problem to me and guiding me in preparing this paper. 


REFERENCES 
. S. Banersex, “Some contributions to Hotelling’s weighing designs,’’ Sankhyd, 
Vol. 10 (1950), pp. 371-382. 
. 8S. Banersex, ‘Weighing designs,’’ Calcutta Stat. Assn. Bull., Vol. 3 (1950-51), pp. 
64-76. 
. Hore uina, ‘Some improvements in weighing and other experimental techniques,”’ 
Ann. Math. Stat., Vol. 15 (1944), pp. 297-306. 
. Kisuen, “On the design of experiments for weighing and making other types of 
measurements,’’ Ann. Math. Stat., Vol. 16 (1945), pp. 294-300. 
. M. Moon, “On Hotelling’s weighing problem,’’ Ann. Math. Stat., Vol. 17 (1946), 
pp. 432-446. 
L. Prackxetr ann J. P. Burman, “Designs of optimum multifactorial experiments,’’ 
Biometrika, Vol. 33 (1946), pp. 305-325. 
J. Wiiiramson, ‘‘Hadamard’s determinant theorem and the sun of four squares, 
Duke Math. Jour., Vol. 11 (1944), pp. 65-82. 


” 








SOME CONTRIBUTIONS TO ANOVA IN ONE OR MORE 
DIMENSIONS: I 


By S. N. Roy anp R. GNANADESIKAN! 


University of North Carolina 


0. Introduction and Summary. Two models are considered in detail which are 
the Models I and II of ANOVA in the terminology of Eisenhart [2]. The present 
paper, which deals with the one dimensional or univariate case, and its sequel, 
which will deal with the multidimensional or multivariate case, seek to give a 
unified general treatment, using matrix methods, of certain problems under the 
two models of ANOVA. Section 1 of each paper, which deals with Model I, is 
of the nature of a résumé giving the main results of a general treatment discussed 
elsewhere [10, 11, 12] by one of the authors. Section 2 of each paper, which deals 
with Model II or variance components model, is self-contained, and presents a 
natural tie-up between the analyses under the two models for a k-way classifica- 
tion. Results in estimation, testing of hypotheses and confidence bounds are 
presented, although the main emphasis is on the results in confidence bounds 
(simultaneous and/or separate) on meaningful parametric functions which are 
physically natural and mathematically convenient measures of departure from 
customary null hypotheses. 

It will be seen that a mixed model, which would include both Models I and II 
as special cases, can be defined, and the associated problems can be studied by 
using methods which are, essentially, a combination of the methods given for the 
separate models in Sections 1 and 2, respectively, of this paper. Since nothing 
essentially new is involved in such a study, this paper does not explicitly dis- 
cuss it. 

Unless otherwise stated, capital letters will denote matrices and small letters 
in boldface will denote column vectors. Such letters when primed denote trans- 
poses. For instance, A(p X q) denotes a matrix with p rows and g columns, 
A’(q X p) denotes the transpose of A, a(p X 1) denotes a column vector with p 
elements and a’(1 X p), the transpose of a, is a row vector. In particular, J(p) 
will denote the identity matrix of order p and O(p X 1) and O(p X q) will stand, 
respectively, for the null vector of order p and the null matrix with p rows and 
q columns. 


1. Résumé of problems and results under the univariate Model I of ANOVA. 
1.1 The Model I. Let x’(1 X n) = (a1, %2, +++, 2n) be a set of n observable 
stochastic variates such that 


(1.1.5) x(n X 1) = A(n XK m)E(m X 1) + e(n X 1), m<n 


Received July 24, 1957. 
! This research was sponsored by the United States Air Force through the Office of 
Scientific Research of the Air Research and Development Command. 


304 





ANOVA IN ONE OR MORE DIMENSIONS: I 305 


where A(n X m), to be called the design matrix, is a matrix whose elements are 
constants given by the design of the experiment and is of rank r S$ m < n and 
where, 

(i) E(m X 1) is a set of unknown parameters; 

(ii) e(m X 1), whose elements are physically of the nature of errors, is a 
random sample from the normal population N (0, o’). 
Under this model it is easily seen that x(n X 1) is a set of n normal independent 
(and hence uncorrelated) variates with a common variance o’ and the respective 
means given by, 


(1.1.2) E(x) = A(n X m)E(m X 1). 


It may be noted that the assumption of normality in (ii) above is not necessary 
for problems of linear estimation, and the results presented below on linear es- 
timation are all valid merely under the assumption that e(n xX 1) 
(hence x(n X 1)) is a set of uncorrelated stochastic variates with a common 
variance oa. Next, since A is of rank r < m < n, we can find a basis A;(n X r) 
of A, which, without any loss of generality and by renumbering the columns of 
A and the elements of &, can be taken to be the first r columns of A and we may 
write (1.1.2) as, 


(1.1.3) E(x) = n[Ay; Ap NE] ft. 


rm—rlt&\|m—r 
1 


1.2 Linear estimation. We seek an unbiased minimum variance linear estimate 
b’(1 X n)x(n X 1) of a given linear function c’(1 K m)E(m X 1) of the unknown 
parameters &(m X 1). The partitioning of A into A; and A» determines the par- 
titioning of E into Er and &p and the partitioning into E, and Ep determines that 
of c’ into c; and cp , so that we can rewrite c’E as c;£; + Seles The main results 
in linear estimation follow. [11, pp. 77-81] 

(1.2.1) All the following results are invariant under the choice of a basis A, 
of A (with a consequent determination of &; and c;). 

The necessary and sufficient condition on c that an unbiased linear estimate 
a’x of c’E exists (in which case c’E will be said to be estimable and the correspond- 
ing condition will be said to be the estimability condition) is that, 


(1.2.2) cp = ¢,(A;Ar)AsAp, 


or, in other words, that cp should be related to c; through the same matrix post- 
factor through which Ap is related to A,. 
Another way to express (1.2.2) would be to say that 


(1.2.21) Rank [2 | = Rank [A], 


which means that (1.2.2) is a convenient mathematical test for (1.2.21). 





306 S. N. ROY AND R. GNANADESIKAN 


The unbiased minimum variance linear estimate of an estimable c’E is given by 
(1.2.3) c;(A;A;) Ax. 

The variance of this linear estimate is given by 
(1.2.4) ¢;(A;A;) ‘cro. 

An unbiased estimate of o’ is given by 
(1.2.5) x’[I(n) — A;(A;A1)'Aj|x / (n — 1). 


(1.2.6) The least squares linear estimate of an estimable linear function c’t 
is the same as the unbiased minimum variance estimate given by (1.2.3). 

1.3 Testing of linear hypotheses. The problem is to test, in terms of the cus- 
tomary F-test, which has a number of well-known good properties, the linear 
hypothesis 


(1.3.1) Ho:C(s KX m)E(m XK 1), or, slC, Ce J] &, r = 0 
rm—ritdiim—r 
l 


against the alternative, 


ree =n(sX1) #0 (say), 
Ep 


where C(s X m) is a matrix given by the hypothesis and is called the hypothesis 
matrix and n is an arbitrary unspecified nonnull vector. Also rank 
(C) = s5rsm < n. In the discussion in [11, 12] a more general C is intro- 
duced, but in almost all problems of practical interest C occurs in the relatively 
simpler form considered above. The main results follow [11, pp. 81—83b]. 
(1.3.2) All the following results are invariant under the choice of a basis A, 
of A (with a consequent determination of &; and C;). 

A sufficient set of conditions (which under certain further restrictions would 
also be necessary ) for the existence of a similar region test for (1.3.1) is given by 


(1.3.3) C: = C\(A;A1)'ArAp, 


or, in other words, C2 should be related to C, through the same matrix post- 
factor through which A » is related to A; . In such a case, the hypothesis (1.3.1) 
will be said to be testable and the condition (1.3.3) will be called the testability 
condition. The testability condition (1.3.3) is a close analogue of the estimability 
condition (1.2.2), and can also be expressed in a form similar to (1.2.21). 

The F-statistic for (1.3.1), having, under Hy , the central F-distribution with 
degrees of freedom s and (n — r), is given by 

x'A,(A; Ar) 'CilCx(A7 Ar) Cal C( Az Ar) Ar X/2 


x (I(n) = As(AT Ar) “Alle /(n = 1) 





(1.3.4) 


‘x 

agx/8s Bid 

= <= (say), 
axo/(n—?1r) 





ANOVA IN ONE OR MORE DIMENSIONS: I 307 


to indicate that the numerator multiplied by s/o” has the x’-distribution with 
degrees of freedom s, being a central or non-central x’ according as Ho or H, is 
true, and that the denominator multiplied by (n — r) / o’ has an independent 
central y’-distribution with degrees of freedom (n — r), no matter whether Ho 
is true or not. 

The quadratic form in the numerator of (1.3.4) is sometimes referred to as the 
sum of squares due to the hypothesis (1.3.1), and the quadratic form in the 
denominator is called the sum of squares due to error. 

Under H, the above F-statistic has a non-central F-distribution with degrees 
of freedom s and (n — r) and a non-centrality parameter 8°/o° where, 


(1.3.5) 6 = w'[C,(ArAr1) Ci], 
which, being a positive definite quadratic form, is zero if, and only if, n = 0, 
i.e., only under Hp. 
Suppose we have two different hypotheses Ho, and Ho: given by 
AnyisilCu Cw rey r = 0(s; x 1) 


rm—ril&mim-r 
] 


Ho2:81Cn Cx | E; r = 0(s: X 1) 
rm—rltiim—-—r 


against respective alternatives, H, and H, , like the one indicated under (1.3.1 


), 
and suppose that rank [Cy Cy] = s,, rank [Cy Cx] = s such that s; + s: S 
r <= m <n. Then for Ho, and Hy we shall have respectively 


a*x3 /(n — r) ~ xg /(n — 1)’ 
where the denominator in F, and F, (the same for both) is the same as that of 
(1.3.4), and the respective numerators are obtained by substituting Cy and Cy 
for C; in the numerator of (1.3.4). xj and x? are each distributed independently 
of xs, but we might seek to know the condition for xi and x3 to be distributed 
independently. The independence of x; and x2, although it would not by any 
means imply the independence of F; and F; , would nevertheless simplify the dis- 
tribution problem connected with the simultaneous testing of Ho, and Ho: and 
the associated simultaneous confidence interval estimation. In this situation we 
would say that F; and F, are quasi-independent and Hy; and Hy are testable in 


a quasi-independent manner. The necessary and sufficient condition for this is 
that 


(1.3.6) Cu(AsA1) "Cu = O(8: X 82). 
This could be easily generalized to k hypotheses, 


HoiislCan Cr E, r = 0(s; xX 1) 
r (m— rr) Ej (m — 1) 
l 


23, 23 
’ ox1/81 o X2 / 82 
fF, = anette and F, = : 





308 S. N. ROY AND R. GNANADESIKAN 


where rank [Cy C2] = s; such that >a s Sr sm <n. A set of necessary and 
sufficient conditions for these k hypotheses to be testable in a quasi-independent 
manner is given by, 


(1.3.7) Ca(ArAr) ‘Cy = 0(8; X 8;), (457 =1,---,k). 


1.4 The associated confidence bounds. We observe from (1.3.1) that n( +0) 
represents a deviation from the null hypothesis Hy. The main results follow 
{11, 12). 

With a joint confidence coefficient 2 1 — a, for a preassigned a, we have the 
following simultaneous confidence bounds: 


22 W 
(1.4.1)  [o?x?}'— E ",(8,n — r) 7 5 < {w'[C:(A; As) "Ci ny 
a.— Tf 


22 7) 
Ss [ox )}' + Eze n—r) os : 


where ox’ and oxo (both independent of o’) are just the quantities defined in 
(1.3.4), and F,(s, n — r) is the upper a-point of the central F-distribution with 
degrees of freedom s and (n — r); 


.s % 
(1.4.2) [o*x*} _ Eze 3 + #) | 


(n —r) 





lA 


tn "(C( AAD CHT a} 
22 j 
S [o*x"}) + Eze n—r) ioe ’ 


fori = 1, 2,---, k, wheren’’, Ci" and o°x'”” denote, respectively, the vector 
n with the 7th component left out, the matrix C, with the 7th row left out and the 
ox defined in (1.3.4) with Ci"(s — 1 X r) in place of C;(s X r); and likewise 


, 2.2 3 
(1.4.3) [07% "7! — Eze n—r) | 
(n — r) 
fn PCED (A, Ar) *Cf ya}! 
22 4 
ox") + Eze n= 1) | 


(n — 1) 





IIA 





IA 


for i ~ j = 1,2, ---, 8, wheren”, C{'” and ox” denote respectively, the 
vector n with the 7th and jth components left out, the matrix C, with the 7th and 
jth rows left out and the o’x’ defined in (1.3.4) with Cj‘? (s — 2 X r) in place 
of C,(s X r); and so on, till we come down to just any single element of n, a 





a 8 
single row of C; and a consequent truncation on x. Notice that there are (;) 


statements like (1.4.2), (3) statements like (1.4.3) and so on till we finally 


~ 


8 8 ‘ 
come down to E cs, ) or (‘) statements at the end. Thus the total number 





ANOVA IN ONE OR MORE DIMENSIONS: I 309 


of confidence statements like these would be 2° — 1. We observe, from these 
simultaneous confidence bounds, that we obtain confidence bounds not only on 
parametric functions which measure departure from the whole set of s linear 
hypotheses in Ho but also on parametric functions which measure departures 
from all possible subsets of the s linear hypotheses in Hp . 


2. Univariate Variance Components. 
2.1 The Model II of ANOVA. Let x’(1 XK n) = (1, 22, °- 


-,2%,) be a set of 
n observable stochastic variates such that 


(2.1.1) x(n X 1) = A(n X m) E(m X 1) + e(n X 1), 
n[Ay Ay, °:> A,| & jm 
mM, +++ Mm E> |m2 
& |m: 
1 
k 
+e2,>,m;=m, (say), 
t=1 


where A(n X m), to be called the design matrix, is a matrix whose elements are 
constants given by the design of the experiment and is of rank r S$ m < n, and 
where 

(i) &:(m; X 1) is a random sample of size m; from the normal population 
N(ui, 0%) for t= 1,2,---,k, and e(m X 1) and &,’s (fori = 1, 2, --- , k) 
are mutually independent; 

(ii) e(m X 1), whose elements are physically of the nature of errors, is a ran- 
dom sample from N(0, o’). 

Under this model it is seen that x(m X 1) is n-variate normal 


N(E(x), z(n X n)], 
where 


E(x)(nXn)=A(nXm)fu - ie 
we ¢lCUd 


| inter 


me 


1(m; X 1) denoting a vector of m; unities (7 = 1, --- , k), and 
(2423 =(n X n) = E(xx’) — E(x) E(x’) 


A(n X m) oi I(m) 0 see 0- A'(m Kn) + o'I(n) 
0 o21 (me) eee 0 


| 0 0 see tins! 


k 
> oA; A! + ol(n). 


i=] 








310 8. N. ROY AND R. GNANADESIKAN 


As in section 1.1, it is to be observed that for purposes of point estimation the 
assumption of normality of the distributions made in (i) and (ii) of the Model IT 
is not necessary. In fact, this assumption is even, probably, not as realistic in 
practice as the assumption of sampling from finite populations with certain 
known probabilities, which will be discussed in later papers. The assumption of 
normality, however, simplifies the distribution problems connected with testing 
of hypotheses (simultaneous and/or separate) and confidence interval es- 
timation. 

We shall consider, in greater detail, that restricted type of k-way classification 
for which the design matrix A(n X m) is such that each row of the submatrix 
A,(n X m;)(i = 1, 2, --- , &) has one and only one non-zero element which is 
equal to unity and rank (A) = m — k + 1. [Notice that in general for such 
A, rank (A) S (m — k + 1).] It can be seen that the usual complete and in- 
complete connected designs are included in this general case. For one thing, these 
designs are relatively simple to discuss from the standpoint of either testing of 
hypotheses or confidence interval estimation under Model II and, for another 
thing, this discussion will prepare the ground for the relatively more difficult 
problem of unconnected designs which will be discussed in subsequent papers. 

Our objectives are: (i) to estimate any estimable linear function of 4, ue, 
-++ we and to test testable linear hypotheses on yw; , we, -** , ue ; (ii) to obtain 
estimates of, and test hypotheses on, the variance components oi, 02, °--, 
o;, o ; and (iii) to obtain confidence bounds on the parameters or certain 
parametric functions [e.g., ratios like ¢{/o’] which are meaningful measures of 
departure from certain customary null hypotheses. 

2.2 Linear estimation and testing of linear hypotheses. Using a result given in 
[11] we can establish the following lemma [3, pp. 59-63}. 

Lema |: For the restricted k-way classification defined above, the necessary and 

sufficient condition for the estimability of c’(1 K m) E(&)(m X 1), a linear 
function of ,,--+ , ux, ts that coefficient of uw, = coefficient of uw. = --- = co- 
efficient of ux . 
This lemma establishes that, for the restricted k-way classification defined above, 
the only independent linear function of uw; , --- , u which is estimable and hy- 
potheses on which are testable is the sum wp = [u; + we + -+- + ws, all other 
such functions being merely multiples of u. 

2.3 Estimation of the variance components. We shall seek (k + 1) quadratic 
forms, qi = x’Q,x’(i = 0, 1, --- , k), of the observations to be utilized in the 
point estimation of, testing of hypotheses on and the confidence interval estima- 
tion of the variance components. We shall impose on the q,’s the following re- 
strictions, which will be justified presently: 


(2.3.1) q:, of rank n;, is distributed as \,x{n,), Where x(n, denotes the cen- 
tral x* variate with degrees of freedom n; and where A; = EF(q:/n;) 
(¢ = 0,1,---,k). 

(2.3.2) q:’s are mutually independent. 


Lemma 2: If x(n X 1) is distributed as N[E(x), X(n XK n)| and gq; = x’Qx 
(i = 0,1, ---, k) is a quadratic form of rank n;, ( >si-on: S n), then, a set of 





ANOVA IN ONE OR MORE DIMENSIONS: I 311 


necessary and sufficient conditions for qo, 41, °** , qx to satisfy the above restrictions 
is given by 


(a) Q:2Q; = AQ: i=0,1,---,k; 

(8) E(x’)Q:E(x) = 0, +=0,1,---,k; 

(y) Q:2Q; - O(n x nN); t#j= 0, Re --- ,k, 
where (a) ensures x’-distributions (in general non-central), (8) ensures the cen- 
trality of the x°-distributions and (7) which actually ensures pair-wise independence 
but is easily checked to ensure mutual independence of the distributions as well in 
this case. For a proof of this lemma see [1, 6]. 

Lemma 3: Jf E[x(p X l)y’(1 X p)] = &(p X p) then E(x'Qy), where 
Q(p X p) is symmetric, is tr &Q where “tr A” denotes the trace (the sum of the 


diagonal elements) of the square matrix A. 
PROOF: 


Pp 
E(x’'Qy) = ED! gies), if Q(p X p) = ((qis)), 
ij= 


Pp 


2 Wises if &(p X p) = ((e;)), 


t»jJ= 


P 
> 4595: 5 since qi; = Qi, 
i,j=1 


= tr &Q 
Coro.uuary: For the Model Il of ANOVA, we have 


(2.3.3) =F (“) = ~ E(x’ Q; x) * tr {[2 + E(x)E(x’)]Q,}, where 


nN; 


E(x) and > are defined in (2.1.2) 


and (2.1.3), 
3 {tr > Q; + E(x’)Q; E(x)}, since 
ny 


tr (AB) = tr (BA) and tr (scalar) 
= scalar ((11], p. A-1), 


, tr > QQ, if q; satisfies (8) of 
nj 


Lemma 2, 


z > o5 tr Aj A; Qi + 0° tr |, 


nj j=l 


using (2.1.3). 
This holds fori = 0, 1, --- 





312 S. N. ROY AND R. GNANADESIKAN 


Next, suppose that in Lemma 2 £(n X n) is unknown, and we require, as a 
further condition on the q;’s, that go, q:, «++ , q@ satisfy (a) and (y) of Lemma 
2 for all symmetric positive definite matrices 2(nm &K n). Under Model II, this 
means that, for all o} , --- , o; and o’, we require the quadratic forms qo, --- , g 
to satisfy (a) and (vy) in addition to (8). Using (2.1.3) and (2.3.3), this means 
that, for Model II, (a) and (y) reduce respectively to 


) 


Q; A1A:Q; = [ 2 te As ai 0%| Q; ,l = 1,2,--+,k;| 


(2.3.4) (¢ = 0,1,---,k), 
Qi = [2 tr 0. Q, | 
Ni j 
and 
Q:A1AiQ; = O(n X n), 1 = 1, «++ , k;) 
(2.3.5) L (Sy j = 0,1, ++ ,k). 
Q.Q; = 0(n X n) } 


Before we proceed further we shall justify the restrictions (2.3.1) and (2.3.2). 
These restrictions provide a set of sufficient (though not necessary) conditions 
for ensuring certain good properties of the solutions to the problems of point 
estimation, testing of hypotheses and confidence interval estimation. 

From the standpoint of point estimation, we have the following lemma: 

Lemma 4: Under Model II, if qo, 1, +++ , qe are (k + 1) quadratic forms, of 
ranks no ,m , *** , M% respectively, satisfying (2.3.1) and (2.3.2) and; = E(q; /n;), 
then, the unbiased estimate with uniformly least variance of the estimable linear func- 
tion > ‘a0 lid; ts given by dio lq: /n; and this estimate is a unique (except on a 
set measure zero) function of qo, G1, *** » Ge. This lemma is essentially the same 
as the result given by the authors of [4] and the proof follows from a theorem 
of Lehmann and Scheffé [5] when we notice that, under Model II, if go, --- , g& 
satisfy (2.3.1) and (2.3.2) then go, --- , q from a set. 5 .uilicient statistics for 
ho, Ar, °** Ae (also for o, oi, °+-, ox), and that they can also be shown to 
satisfy the completeness condition of Lehmann and ‘scheffé [5]. 

It may be noted that Lemma 4 holds, not on'y for a linear function of \,’s, 
but also, in general, for any real valued estimable function f(Ao , A1, «++ 5 Ax)- 

Next, from the standpoint of testing of hypotheses, we observe that hy- 
potheses on variance components are usually composite and that a legitimate 
quest might be to obtain similar region tests for these hypotheses. From the 
properties of sufficiency and completeness mentioned above for go, 1, °°* , G 
satisfying (2.3.1) and (2.3.2), it follows, from another theorem of Lehmann 
and Scheffé [5], that the class of all similar tests of hypotheses on the o”’s will be 
of Neyman structure, or Neyman mechanism regions [8], with respect to 
Ge, Gs °° > Gee 

Finally, if the quadratic forms q , q:, --- , qx satisfy these restrictions then, 
as will be seen later in this paper, we can obtain simultaneous confidence inter- 
vals on o|, 02, °°: ,o%, 0 and on ratios like o; /o’ without running into intrac- 





ANOVA IN ONE OR MORE DIMENSIONS: I 313 


table distribution problems or nuisance parameters, although it is not said that 
this would be impossible except under these restrictions. 

We shall now proceed to obtain a set of quadratic forms go , m1, --- , @& for the 
restricted k-way classification and present a tie-up between the analysis under 
Model I, considered under Section 1, and the analysis under Model II. 

2.4 Tie-up between the analysis under Models I and II for the restricted k-way 
classification. We recall from Section 1.3 that, under Model I, we can obtain k 
sums of squares due to the k testable hypotheses of equality of the elements of 
E; (¢ = 1,2,--- ,k), which can, by analogy with (1.3.1), be written as 


Hoi:C(m; — 1 X m)e(m X 1) = [Ca Cal, 


where r = rank (A) 


= (m—k+1) 


(m; Ane - 
‘ 


m,™Me - 
= 0(m; — 1 X 1), 
so that rank (C;) = (m; — 1)(¢ = 1, 2,--- , &). It is easily verified that, if 
, . ° 
E; = (fa, Ee,--+, §im,), then, the hypothesis {i = --- = fim; is exactly 
equivalent to (2.4.1). As in Section 1.3, we obtain k sums of squares due to the 
k hypotheses Hq, , Ho, --- , Hox , viz., 


(2.4.2) x’A;(AjA1) 'CalCa(ArAr) Cal Ca(ArA:) Ax 


fori = 1, 2,--- , k with (m; — 1)(= n;, say) degrees of freedom. We further 
have the sum of squares due to error, 


(2.4.3) x’[I(n) — A;(A;A1)'Aj]x 


with degrees of freedom = n — r = (n — m+ k — 1)( =m, say). Notice 
that Dino 1; =(n—1)<n. 

Now, under Model II, in the notation of section 2.3, we take go to be the sum 
of squares due to the error given by (2.4.3) and g;, for i = 1, 2,--- , k, to be 
the sums of squares due to the hypothesis given by (2.4.2). If then we apply the 
conditions (a), (8) and (vy) of Lemma 2 to this set of quadratic forms, we can 
verify that go, q:,-°°* , qe all satisfy (8) so that centrality of the distribution 
(if it is x’ at all) is assured for each g;(i = 0, 1, --- , k). Using the fact that 
the matrices of these quadratic forms are all idempotent, and by repeated appli- 





314 S. N. ROY AND R. GNANADESIKAN 


cation of Lemma 3, we can also obtain that 


(2.4.4) =o; hi = voite, fori = 1,2,---,k, 
where 

Se: Brie sum of all the elements in and below the diagonal 

* (m - 1) 


of the symmetric matrix (Ca(A; Ar) Cal} 


Thus [1/»,][¢i/n: — qo/no] will be an unbiased estimate of ¢; , while go/no will be 
an unbiased estimate of o’, so that, if these particular go, q:, --* , qe Satisfy (a) 
and (7) of Lemma 2 as well, then, by Lemma 4, these estimates will also have 
uniformly least variance in the class of unbiased estimates. It is easily verified 
that qo always satisfies (a) and (y) of Lemma 2, so that, the error sum of squares 
obtained under Model I is always, under Models I and II, distributed as o°x{n,), 
where x{n,) is the central x’ variate with m(= n — m + k — 1) degrees of 
freedom, independently of q: , gz, --* , qx and therefore, always provides (when 
divided by mo) an unbiased estimate, with uniformly least variance, of o°. Next, 
applying the conditions (a) and (y) of Lemma 2 to q; (¢ = 1, 2,---, k) and 
simplifying the conditions, we get them respectively in the forms 


aes Ca(A;Ar) Ca = : [I(m; — 1) + J(m; — 1 X& m; — T)I, 
(2.4.5) Vi 


for 1 = 1,2,---,k, 


where J (p X q) stands for a matrix all of whose elements are equal to unity, 
and 


eer Ca(ArA1) "Ch = O(m; — 1 X m; — 1) 


fori # 7 = 1, 2,--- , k. Note that (2.4.5) and (2.4.6) are independent of the 
unknown variance components oj , --- , 0% and o’. 

The conditions (2.4.5) and (2.4.6) are both satisfied by the usual complete 
designs like Randomized Block, Latin Square, Factorial designs under a strictly 
additive model with no interactions. However, the incomplete designs, like Bal- 
anced Incomplete Block designs, do not, in general, satisfy (2.4.6) while they do 
satisfy (2.4.5). Thus the restrictions (2.3.1) and (2.3.2), taken together, are not 
too restrictive in that the usual complete designs have sums of squares like 
(2.4.2) which are useful in the analysis (to answer customary questions) under 
both Models I and II. However, they are restrictive in that the incomplete de- 
signs do not have sums of squares like (2.4.2) that can be used directly and con- 
veniently in the analyses of both the models. 

In Sections 2.5 and 2.6, we shall discuss some simple situations by assuming 
that the k-way classification under consideration has sums of squares like (2.4.2) 
which satisfy both (2.4.5) and (2.4.6), thus rendering relatively easy simul- 
taneous testing and simultaneous confidence interval estimation of oi/o', --- , 
oi/o°. If, however, (2.4.6) were not satisfied but merely (2.4.5) as, for example, 
in incomplete block designs, then simultaneous testing or simultaneous interval 





ANOVA IN ONE OR MORE DIMENSIONS: I 315 


estimation would be far more difficult (and will be discussed in later papers) but 
the separate tests and separate confidence interval estimation can be obtained in 
exactly the same way as in the following discussion. Problems involving inter- 
actions in factorial designs will also be discussed in later communications. 

2.5 Tests of hypotheses on the variance components. The usual hypotheses tested 
are Ho;:0: = 0 against respective alternatives Hi;:0; > 0 for i = 1, 2,---, k. 
Working in terms of the sums of squares (2.4.2), (2.4.3), it can be seen from 
(2.4.4) that these hypotheses are equivalent to Ho;:\; = Ao against His:ds > Yo 
for i = 1, 2,---, k. For each 7, therefore, under Model II, we can test Hi: 
against Hi by taking as the critical region the region defined by 


(2.5.1) Fo = 2S Fn, no) 

Go/% 
where F;, under Ho;, has a central F-distribution with degrees of freedom n; 
and np and F, (n;, mo) is the upper 100a% point of the central F-distribution 
with degrees of freedom n; and no . From (1.3.4), notice that the critical regions 
(2.5.1) for the individual hypotheses Ho; , under Model II, are identical with 
those obtained for the individual hypotheses (2.4.1) under Model I. 

The critical regions under Models I and II have an identical nature even 
when we consider the simultaneous hypotheses Ho:0i = o2 = --- = of = 0 
against the alternative H,: at least one o; > 0, which is equivalent to considering 
Ho:\/ho = +++ = Xe/Ao = 1 against Hj: at least one d/o > 1. The critical 
region of the simultaneous test obtained by the heuristic union-intersection 
principle [9] is, 


(2.5.2) F, Pts Fy, > a > poe. F,.> ay , 


where F; = (q:/ni) / (qo/mo), F,’s are, in the terminology of section 1.3, quasi- 
independent variance ratios and a,’s are such that the region (2.5.2) is of size a 
for a preassigned a. It is easily seen that the critical region (2.5.2) is identical 
with that of the simultaneous ANOVA test of Ghosh for Model I [7]. 

2.6 Simultaneous confidence statements. When the q,’s, given by (2.4.2), (2.4.3), 
satisfy the restrictions (2.3.1) and (2.3.2) we can find constants Xia;(;) = Xia, 
(say) and X2a;(;) = Xt, (say) for 7 = 0,1, --- , k, such that the simultaneous 
statements, 


ante 2 2 2 2 2 2 
(2.6.1) Xlag S go/ro S X20 5 Xle, = qi/M Ss X2ay» °** » Xlay Ss Qi/ Ax S X20, 


have a joint confidence coefficient (1 — a) = [T]j-o (1 — a;), where 


P(xia; & Xinj) S X2a;) = (1 — aj) 


and Xin;) denotes the central x° variate with n; degrees of freedom, (j = 0,1, --- , 
k). By inverting the statements (2.6.1), we obtain, with a joint confidence 
coefficient (1 — a), the simultaneous confidence statements, 


CiagGo S Ao S CoanQ ; Cia; Qi =r C2a;41 » 


(2.6.2) 
Cia,Gk S Ae S CrayQe , 





316 8S. N. ROY AND R. GNANADESIKAN 


where Cia; = [x2a;) | and Cra, = [xie;| for j = 0,1, --- , k. Recalling (2.4.4), 
we can obtain the following set of simultaneous confidence interval statements, 
which are implied by (2.6.2), on the variance components: 


2 2 1 
Clap fo SF S Cray, * (Cra, G1 — Cra, Gol S 1 S * [Coa, Qt — Ciay Qo), 
1 1 
(2.6.3) 
I 2 1 
ri . (Cray dk — Crag qo] sa.3 a [Cony Yk — Clay qo). 
k k 


Since (2.6.2) implies (2.6.3) it follows that the confidence coefficient associated 
with the statements (2.6.3) is 2 (1 — a). In order to be non-trivial, of course, 
the constants Cia; , C2a; (for 7 = 0, 1, --- , k) must be such that all the bounds 
in (2.6.3) are non-negative. 

As a simple extension we can also obtain simultaneous confidence interval 
statements on uw = wi + we + --- + we and the variance components. 

Next we shall obtain simultaneous confidence bounds on the ratios «i/o, 
o2/0°, -*+ , 0/0. When the q;’s satisfy the restrictions (2.3.1) and (2.3.2) then, 
for i = 1, 2,---, k, Fi = (qi/nadx) / (qo/nodo) are quasi-independent in the 
sense of section 1.3, each having a central F-distribution with degrees of freedom 
n; and no . The joint distribution of these quasi-independent F,,’s is known [{7, 3] 
and we can determine constants PF, , FP, for i = 1, 2,--- , k, such that the 
simultaneous statements, 


; " /nyr G/M 
(2.6.4) Fus @ _ -< Pu,--, Fn Se s Fin, 
qo/™ Xo qo/ No Xo 


have a joint probability = (1 — a), for a preassigned a. Recalling (2.4.4), we 
can invert (2.6.4) to obtain the simultaneous confidence interval statements, 


‘| ee | .- | 2 A |, 
Vy} nF i2 Yo o~ Vy nm Fy qo 


. e 
| e_ |sas3 oe IP sai | 
vuLneF ie Jo a mL ne Pe qo 
with a joint confidence coefficient = (1 — a). Here, again, for non-triviality 
the bounds should all be non-negative. 


(2.6.5) 


REFERENCES 

{1} O. Carpenter, ‘‘Note on the extension of Craig’s theorem to non-central variates,” 
Ann. Math. Stat., Vol. 21 (1950), pp. 455-457. 

{2} C. E1rsennart, ‘The assumptions underlylng the analysis of variance,’’ Biometrics, 
Vol. 3 (1947), pp. 1-21. 

[3] R. GNANADESIKAN, ‘“‘Contributions to multivariate analysis including univariate and 
multivariate variance components analysis and factor analysis,’’ Institute of 
Statistics, University of North Carolina, Mimeo. Series, No. 158 (1956). 

(4) F. A. GrayBiLtt ANb A. W. Worruam, ‘‘A note on uniformly best unbiased estimators 
for variance components,’’ J. Amer. Stat. Assn., Vol. 51 (1956), pp. 266-268. 





ANOVA IN ONE OR MORE DIMENSIONS: I 317 


[5] E. LEHMANN AND H. Scuprr®, ‘‘Completeness, similar regions and unbiased estimation, 
Part I,’’ Sankhya, Vol. 10 (1950), pp. 305-340. 

[6] J. Ogawa, “On the independence of quadratic forms in a non-central normal system,”’ 
Osaka Math. Jour., Vol. 2 (1950), pp. 151-159. 

[7] K. V. RamacHanprRan, “On the simultaneous analysis of variance test,’’ Ann. Math. 
Stat., Vol. 27 (1956), pp. 521-528. 

[8] S. N. Roy, “Univariate and multivariate analysis as problems in testing of composite 
hypotheses I,’’ Sankhya, Vol. 10 (1950), pp. 29-80. 

{9} S. N. Roy, ‘On a heuristic method of test construction and its uses in multivariate 
analysis,’’ Ann. Math. Stat., Vol. 24 (1953), pp. 220-238. 

{10} S. N. Roy anp R. C. Boss, ‘‘Simultaneous confidence interval estimation,’’ Ann. Math. 
Stat., Vol. 24 (1953), pp. 513-536. 

[11] S. N. Roy, “‘A report on some aspects of multivariate analysis,’’ Institute of Statistics, 
University of North Carolina, Mimeo. Series, No. 121 (1954). 

{12} S. N. Roy anp R. GNANADESIKAN, “Further contributions to multivariate confidence 
bounds,’’ Institute of Statistics, University of North Carolina, Mimeo. Series, 

No. 155 (1956). Also Biometrika, Vol. 44 (1957), pp. 399-410. 








SOME CONTRIBUTIONS TO ANOVA IN ONE OR MORE 
DIMENSIONS: II 


By 8S. N. Roy anp R. GNANADESIKAN! 


University of North Carolina 


0. Introduction and notation. This paper presents certain natural extensions, 
to the multi-dimensional or multivariate situation, of the results contained in 
the first paper [10] by the authors. We shall use the same notation as before and, 
in addition, we shall use the following notation: c(A) will denote all the char- 
acteristic roots of the matrix A, and if A is at least positive semi-definite, then 
Cmin(A) and Cmax(A) will denote, respectively, the smallest and the largest of 
these roots; D.(p X p) will denote a diagonal matrix whose elements are a; , 
a@,---,a,;7(p X p) will denote a triangular matrix whose non-zero elements 
are along and below the diagonal; |A| will denote the determinant of a square 
matrix A; and, A(p X p)-XB(q X q) will denote the Kronecker product or 
right direct product [5] of the matrices A and B. Also min (p, q) will denote 
the lesser of the two real numbers p and gq. 








1. Resume of problems and results under the multivariate Model I of ANOVA. 
1.1 The multivariate Model I: Let X(p K n) = p[x: X2 --- Xn] be a set of n 
observable stochastic p-vectors such that 


X'(n X p) = A(n XK m)E(m X p) + e(n X p), m <n, 


(1.1.1) = nA, An \|& r (aay 
r (m—r) L&J(m — r) + en Xp) (say), 


p 


where, as in the univariate situation, A is the design matrix with rank 
(A) =r sm Sn, and A; is a basis of A with a consequent partitioning of 
£ into &,(r X p) and &p(m — r X p), and where 

(i) &(m X p) is a set of unknown parameters; 

(ii) «(n X p), whose elements are physically of the nature of errors, is a 
random sample of size n from the non-singular p-variate normal N[O(p X 1), 
=(p X p)]. Furthermore, we assume here that p S (n — r). 

Under this model it is seen that x;(p X 1), fori = 1, 2, --- , n, are n inde- 
pendent stochasic p-vectors such that x; is N[E(x;), =], where the unknown 
dispersion matrix =(p X p) is the same for all the n vectors, and E(x;), for 
i = 1,2,---,n, is given by 


(1.1.2) E[X’|(n X p) = A(n X m)é(m X p) = [Ar Aol ot 
D 
Received July 24, 1957. 
1 This research was sponsored by the United States Air Force through the Office of Scien 
tific Research of the Air Research and Development Command. 


318 





ANOVA IN ONE OR MORE DIMENSIONS: II 319 


Here again, as in the univariate case, the assumption of normality in (ii) of 
the above model is not necessary for purposes of linear estimation. In fact, 
since the linear estimation part of the present problem can be easily handled 
and may not be of much additional interest, we skip it and proceed directly to 
the solutions of the problems of testing of linear hypotheses and the associated 
confidence bounds. 


1.2 Testing of linear hypotheses. The hypothesis that we seek to test, under 
the model of section 1.1, is 


Hy: C(s X m)t(m X p)M(p X u) = O(s XK u), 


(1.2.1) or, s{C, c. 4 t:(r X p) 
r (m—r) L&((m — r) X p) 


| xX u) = O(s X uw), 


against 
H: CEM = n(s X u) ¥ O(s X U), 


where C and M are matrices given by the hypothesis, and hence called the 
hypothesis matrices, such that rank (C) = s Sr Sm <nandrank(M)=us p 
and »(s X u) is an arbitrary unspecified nonnull matrix. Notice that s may be 
greater than or equal to or less than wu. One can, of course, verify that (1.2.1) is 
by no means the most general type of linear hypothesis imaginable, although it 
includes a wide variety of linear hypotheses in which we might be interested. 
The main results follow. [12, pp. 84a-84i] 

(1.2.2) All the following results are invariant under the choice of a basis 
A, of A (with a consequent determination of £; and C;). 

(1.2.3) Whether we use the likelihood ratio criterion or the one used by the 
authors, [12], we have a similar notion of testability for this situation as for the 
univariate case, and the testability condition is the same as (1.3.3) of [10]. 
(1.2.4) The test itself is given by the following rule: 

Reject Hy against H if cmax(SiS) = ce(u, 8, n — r) and accept (do not 
reject) Hy against H otherwise, where Cmax(S,;S"') denotes the largest char- 
acteristic root (necessarily positive except on a set of probability measure zero) 
of SiS’, ca(u, 8, n — r) to be called c, , for shortness, is a constant which de- 
pends on the level of significance a and the degrees of freedom u, s and (n — r) 
and which is being tabulated from the relation 


(1.2.5) Plemax(Si:S"') 2 ca | Ho] = a, 


the distribution involved being long available {11, 12]. Here, S; is anu X u 
symmetric and at least positive semi-definite matrix of rank, almost everywhere, 
(i.e., except on a set of probability measure zero), min(u, s), being given by 


(1.2.6) sSi(u X w) = M’XA,(AjAr)CiCi(AtA1) Ci] C1 AsA1) A; X'M, 


and S in an u X u symmetric and, almost everywhere, positive definite matrix 
of rank u (necessarily), given by 


(1.2.7) (n — r)S(u X u) = M’X[I(n) — A;,(AtA;)7A))X'M. 





320 S. N. ROY AND R. GNANADESIKAN 


We shall call the matrix on the right side of (1.2.6) the matrix due to the 
hypothesis (1.2.1) and the matrix on the right side of (1.2.7) the matrix due to 
error. 

The reduction, to a canonical form, of the relevant distribution problem is 
one in which the characteristic roots of S,;S’ are the same as those of 
(((n — r)/8)¥i:Yi(YY’)"], where Yi(u X s) and Y(u X (n — r)) have, in 
general, i.e., under H, the distribution 


min(u,s) 


mh { , 
oe en | > (Y1¥i+ YY’) + a Yi 


(1.2.8) min(u,s) 
—2 2 (Y)ev7h | dY,dY, 
where y;’s (¢ = 1, 2, --+ , min(w, s)) are the possibly non-zero characteristic 


roots of the u X u matrix /[(C:(A1A1)‘Ci] ‘n(M’ZM)~. It is to be noted that 
the u characteristic roots of this matrix are all non-negative and ¢ of them are 
positive while the rest, (wu — ¢) in number, are zero, where ¢(Smin(u, s)) is 
the rank of ». All the roots are zero if, and only if, 7 = 0, i.e., under Hy , and 
in this case we have, for Y; and Y, the distribution 


(1.2.9) (Qar) (“nr tl!? oxy [—2 tr (Yi¥i + YY’)|dY¥, aY. 


The distribution of Cmax(S)S™’), ie., of Cmax[(n — r/ s)¥i¥i(YY’)], on the 
null hypothesis Hp , was obtained earlier [11, 12] starting from (1.2.9), and this 
forms the basis of the tables, now being prepared, giving c.(u, s, n — r) when 
a, u, s and (nm — r) are prescribed. It may be noted, from (1.2.8) and (1.2.9), 
that Y, and Y are independently distributed. 

We can introduce here, just as in the univariate case, the notion of two or 
more different hypotheses like (1.2.1) being testable in a quasi-independent man- 
ner and can derive a set of necessary and sufficient conditions for this. In fact, 
when the hypotheses differ only in their C matrices and have the same M matrix, 
so that, we have, for instance 


Ho: Ci(si X m)t(m XK p)M(p X u) 
(1.2.10) | 
= 0(s; X u), fort = 1,2,---,k 
against respective alternatives H;, like H defined under (1.2.1), where rank 


Y k 
(C;) = rank [Ca Ce |s; = s;, and 7 ae 8; Sr =m <n, then, the necessary 


r (m—r 


? 


and sufficient condition for being able to test the k hypotheses (1.2.10) in a 
quasi-independent manner is that 


/ 
y 


(1.2.11) Ca(AyA1) "Cha = 0(8; X 8;), (647 = 1,2,---,k), 


which is the same as condition (1.3.7) of [10] for the univariate case. 

1.3. The associated confidence bounds. Going back to (1.2.1), we observe that 
n(# 0) represents a deviation from Hy). The main results follow [9]. 

With a joint confidence coefficient 2 (1 — a), for a preassigned a, we have 





ANOVA IN ONE OR MORE DIMENSIONS: II 


the following simultaneous confidence bounds: 
Cuax(8S1) — [8Cal’Cuax(S) S Cmaeln’(Cr(ArA1) Ci) *a] 


S Cmix(881) + [s¢a]*emdx(S) ; 
and similar confidence bounds in terms of the same c, but of truncated 7, S; 
and S obtained by going back to (1.2.1) for , to (1.2.6) for S, and to (1.2.7) 
for S, and then (i) cutting out any row of »’ and the corresponding row of M’, 
or any two rows of 7’ and the corresponding two rows of M’ and so on till we 
get down to just any row of »’ and the corresponding row of M’, and also (ii) 
cutting out any column of 7’ and the corresponding row of C, , any two columns 
of »’ and the two corresponding rows of C, and so on; and finally (iii) combining 
any case of truncation under (i) with any case of truncation under (ii). Thus, 
with a joint probability 2(1 — a), we have (2" — 1) X (2° — 1) statements 
of which (1.3.1) is the first one. 

As in the univariate case [10, Section 1.4], it is to be observed that, taken 
together, the (2 — 1) X (2° — 1) confidence statements enable us to put con- 
fidence bounds on parametric functions which are not only measures of deviation 
from the total null hypotheses (1.2.1) but also on all the component parts of it. 

It may be noted that if, in the Ho of (1.2.1), M(p X u) is not present, as will 
be seen in the hypotheses of Section 2.4, then all the above results go through if 
we replace u by p and M(p X u) by the identity matrix J(p). 


(1.3.1) 


2. Multivariate Variance Components. 
2.1 The multivariate Model II: Let X(p K n) = [xix2. --- Xn] be a set of n 
observable stochastic p-vectors such that 


X'(n X p) = A(n X m)t(m XK p) + e(n X p), m< n, 


k 
= n[A, Ao --- Axil 1 pm, + e(n X p), >, m; = Mm, 
t—l 


Mm, M2 +++ mM | Eo }mMe (say), 


(2.1.1) 


Ex _|m 
p 
where A is the design matrix of rank r S m < n, and where 

(i) &:(m; X p) is a random sample of size m; from the p-variate non-singular 
normal population N[u:(p X 1), 2i(p X p)| fori = 1, 2, --- , k, and e(n X p) 
and £,’s (fori = 1, 2, --- , k) are mutually independent; 

(ii) e(n X p), whose elements are physically of the nature of errors, is a 
random sample of size n from the p-variate non-singular N[O(p X 1), =(p X p)). 
Furthermore, we assume that p S (n — r). 

Writing 

Xi 


X'(n X p) = nimi x --- x») and x(pn X 1) =| *® 


, 


Xp 





322 S. N. ROY AND R. GNANADESIKAN 


we see that, under the above model, the elements of X’(n X p), i.e., of x(pn X 1), 
have a pn-variate non-singular normal distribution N[E(x), =*(pn X pn)], 
where 


E(x)(pn X 1) = A*(pn X pm)] wn-1 | m, 


Mer" 1 | m 
(2.1.2) 


Hip'1 | m 


Mep*1_| m 
and where 


ui(pX1) =] ua | and A*(pn X pm) = A(n X m)- X I(p) 


Lie =i AQ --- OF, 
x n|O A 0 
Hip z s 
ni 9 O --» A 
m m eee m 
and 
=*(pn XK pn) = E(xx’) — E(x)E(x’) 
(2.1.3) 


m AAs XB, + Aco MEe+ --- + Asha MBs 
+I(n)-X2z 


if we recall and use the Kronecker product notation A- XB. 
We shall, in this paper, consider, in detail, only the relatively more restricted 
model wherein 


(2.1.4) =p X p) = ci=(p X p), 


since, as will be shown in section 2.3, the more general set-up of the model de- 
fined above does not lend itself to an easy mathematical treatment. We shall 
call the model defined at the beginning of this section, taken together with the 
restriction (2.1.4), as the restricted multivariate Model II of ANOVA. Federer 
[3] points out that models, where dispersion matrices are proportional, have 
been tentatively proposed for a certain type of genetical problem so that our 
restricted model might still be meaningful in certain physical situations.’ 





2 Since this paper was written up and submitted in July, 1957 further investigation 
showed that even without this (rather severe and unrealistic) restriction it was still pos 
sible to go ahead with (i) point estimation, (ii) testing of hypothesis and (iii) confidence 
interval estimation, but in terms of a different set of statistics leading up to results less 
sharp than those aimed at here. The mathematical tools needed are those given here plus 
some further tools. Thus, from a physical standpoint, this paper might be regarded as an 
indispensable first step toward handling the more realistic situation that does not involve 
the very restrictive assumption of proportionality. The justification of the present paper 
from a physical standpoint, in terms of a possible genetical application, is thus today en- 
tirely redundant. 





ANOVA IN ONE OR MORE DIMENSIONS: II 323 


Our objectives will be: (i) to estimate any estimable linear function of the 
elements of uw, --- , ux and to test testable linear hypotheses on w, --- , ws ; 
(ii) to obtain estimates of, and test hypotheses on, the multivariate variance 
components, viz., the characteristic roots c(2,), c(Z2), --+ , c(2e) and c(Z); 
= ( ~ to obtain confidence bounds (simultaneous and/or separate) on 

2), --- ,c(2,) and c(Z). Under the restricted Model II, of course, (ii) is 
or to obtaining estimates of, and testing hypotheses on, oi yt Ok 
and ¢(), while (iii) isequivalent to obtaining confidence bounds oni, 02, ° 
o; and ¢(2). 

2.2 Linear estimation and testing of linear hypotheses. Recall that for the 
restricted k-way classification the design matrix A(n X m) is such that, for all 
a = 1, 2,---, k, the submatrix A,(n X m,) has one and only one non-zero 
element, equal to unity, in each row, such that rank (A) = (m — k + 1). 
When we select n individuals under this design and measure each on not one 
but p variates we have a multivariate restricted k-way classification analogous 
to the univariate restricted k-way classification discussed in [10]. Multivariate 
analogues of the usual complete and incomplete connected designs are included 
under this general case. Using a result given in [12] we can establish the following 
lemma [4, pp. 96-97]: 

Lemma |: For the multivariate restricted k-way classification, the necessary and 
sufficient condition for the estimability of Pee (1 x p)ui(p X 1) is that 


(1 X p) = (1 X p) = --- = h(1 X p), 


so that, a linear function Dia Ld X p)ui(p X 1) ofall theelementsof wu: , --- , ux , 
which is estimable, and hypotheses on which are testable, is of the form 
(1 X p)iwi + --- + wel, and hence, neither linear functions of the elements of 
each wp; nor the elements of each w; are separately estimable and linear hypotheses 
on these separate functions or elements are not testable. 


=» 


2.3. Estimation of the multivariate variance components. Analogous to the 
univariate x?-distribution, we shall introduce, for the multivariate situation, the 
pseudo-Wishart distribution a definition of which follows. 

Suppose X(p X n) has the distribution 


(2.3.1) (2r)"?"? |>\-"” exp l= tr >(X — ¢)(X’ — ) | dX 


where the elements of X, z;; are such that —» < 2;; < © (¢ = 1,:---,p) 
(j = 1,---,n) so that, F(X) = ¢(p X n) and the symmetric positive definite 
matrix £(p X p) is interpreted as, nZ(p X p) = E[(X — &) (X’ — ¢’)]. Then 
we shall call the distribution of the symmetric at least positive semi-definite 
matrix, S(p X p) = (1/n)XX’, the pseudo-Wishart distribution with degrees 
of freedom n, and the distribution is central or non-central according as ¢ = 0 or 
¢ 0, i.e., according as ¢¢’ is the null matrix O0(p X p) or not. Conversely, we 
shall say that any symmetric at least positive semi-definite matrix, S(p X p), 
has the pseudo-Wishart distribution (in general, non-central) with degrees of 





324 8. N. ROY AND R. GNANADESIKAN 


freedom n, if we can write S(p X p) = (1/n)X(p X n)X’(n X p), where 
X(p X n) has the distribution (2.3.1). Further, if E(X)E(X’) is the null 
matrix 0(p X p) then, and then only, will the distribution be said to be central. 
In particular, if in the above definition, p S n and rank (S) = p then a pseudo- 
Wishart distribution for S is equivalent to the ordinary Wishart distribution. 

Starting from (2.3.1), it can be shown that the distribution of the ith diagonal 
element, fori = 1, 2, --- , p, of XX’, where of course (1/n)XX’ has a pseudo- 
Wishart distribution with degrees of freedom n, is distributed as @iiXin) Where 
o;; denotes the ith diagonal element of =(p X p) and where x{,) stands for the 
x variate, with degrees of freedom n, being central or non-central according as 
the pseudo-Wishart distribution of (1/n)XX’ is central or non-central. 

We shall next proceed to problems of estimating and testing hypotheses on 
the multivariate variance components. For these purposes, by analogy with the 
univariate case [10, section 2.3], we shall seek (k + 1) matrices, S;(p X p) = 
(1/n,)X(p X n)Qi(n K n)X'(n X p) (fori = 0, 1, --- , &), where Q;(n XK n) 
is symmetric and at least positive semi-definite of rank n; < n, D-ion: S n, 
such that 
(2.3.2) (1/A;)S;, of rank Sp, has a central pseudo-Wishai distribution 


with degrees of freedom n; (i = 0, 1, --- , &), where \, is a positive constant; 
(2.3.3) 2 So, a Si,-++:, * S, are mutually independent. 
Ao At AE 
Lemma 2: If X(p X n) has the distribution (2.3.1), then a set of necessary and 
sufficient conditions for So, S; , --- , Se to satisfy the above restrictions is given by 


(a) Qi(n X n) = AQ; which is equivalent to the statement that Q; = NLL; 
where Li(n; X n)Li(n XK ni) = I(n:)(i = 0,1, --- , k); 

(b) E(X)Q:E(X’) = O(p X p); 

(c) QQ; = O(n X n) which, taken together with (a), is equivalent to the state- 
ment that Li(n; X n)Lj(n X nj) = O(n; X nj) (§ ¥ j = 0,1,---, k). 

Proor: Necessity of (a), (b), and (c). Suppose that the matrices 
So, Si, +--+, Se satisfy (2.3.2) and (2.3.3). Then, since 


(1 /xr)S; = (1/naA;)XQ:X’ = XP;X' 
(where P;(n X n) = (1/,;)Q:(m X n)) has a central pseudo-Wishart distribu- 
tion with degrees of freedom n;, therefore, by our previous discussion, the 
jth (j = 1, 2, --- , p) diagonal element of XP;X’ is necessarily distributed as a 
constant times x{n,) where x{n;) denotes a (central) x’ variate with n; degrees 
of freedom. Now, if 
X(p X n) =[x: 


/ 


Xe 
- 
Xp 


n 





ANOVA IN ONE OR MORE DIMENSIONS: II 


(say) has the distribution (2.3.1) then x;(1 X n) has the distribution 


1 ans % ‘ 
Qro,)n2 OP |- (xy — £5)(x5 — | dx;, 


I 


t(p X n) = E(X(p X n)) = Th 


7 
Sp 
n 
and oj; is the jth diagonal element of =(p X p). Using the result of (2, 6], we 
have that, if x;(n X 1) has the above distribution, then, in order that the jth 
diagonal element of XP;X’, i.e., x;Px;, where P;(n X n) is symmetric at least 
positive semi-definite of rank n; < n, may be distributed as 0 4iXin;)s we must 
necessarily have 


Pi = P;,ie. Qi = AQ. 
Hence the necessity of (a). 
Next, since Q;(n X n) is symmetric positive semi-definite of rank n; and ), 


is a positive constant, therefore, by a well-known result [12, pp. A-16 and A-17], 
there exists a transformation 


TNT: Tod. 
7: 


nN; 


= X(p X n) T, ? 
T2 


+ 20,2" « = ¥, 3s. 
Nid; ni 
Thus, if (1 /naA,)XQ:X’ = (1/n,) YY; has a central pseudo-Wishart distribu- 
tion, then, by definition, E(Y,E(Y:) = O(p X p), ie., E(X)Q:E(X’) = 
O0(p X p), which proves the necessity of (b). 

Finally, if (1/0) So, --- , (1/Ax) Se are distributed mutually independently 
in pseudo-Wishart forms with respective degrees of freedom m , --- , m , then, 
necessarily, their Ith (1 = 1, 2, --- , p) diagonal elements, viz., 


, / 
x:Pox:, os , X:P;x:, 


where P; = (1/d;)Q; (i = 0, 1, --- , &), are distributed as constant multiples 
of mutually independent x’ variates with respective degrees of freedom no , 





326 S. N. ROY AND R. GNANADESIKAN 


m,--: and m . Hence, from (2, 6], we necessarily have 
PP; =O(n Xn) or QQ; = 0(n X n) 


fori # j = 0,1, --- , k, which proves the necessity of (c). 
Sufficiency of (a), (b), and (c). We shall now assume that Sy, S;,--- , Si 
satisfy (a), (b) and (c), so that 


= Qi(n x n) = Li(n X ni) Lin; X n) 


where L;L; = I(n;) fori = 0,1,---, kand L;L; = O(n; X n;) fori ¥ j = 
0,1, --- , k, and, also, #(X)Q;E(X’) = O(p X p) fori = 0,1,--- , k. When 
Io, In, +++ , Le satisfy these conditions, it is well known that we can find a 
completion L*((n — }-t.on;) X n) of the matrix 


Lo 
Ly 


Li 
such that the completed matrix 


L(n X n) =|] Lo 


L* 


is orthogonal. Let us now make the transformation 


Y(p Xn) = pl¥o Vi--- Vi Y* J=X(pXn)[Lo Li --- Ly L*, 

k 

M M*** MA (» —_ > ns) 
t=0 

so that, the Jacobian of the transformation is unity. Notice that 
k k 
at 8,045" = + ~ XQ. X’ + XL” L*X’. 
t=0 i=0 Aj 


Starting from the distribution (2.3.1) of X(p X n), we therefore have for the 
joint distribution of Yo, --- , Y; and Y* 


(2a)~?”? |3|-*” exp IZ > cP (¥; — «)(¥c — 2) 


- \& 


k 
+e" +o" = >} | x [TL ay,-ay*, 
t=0 





ANOVA IN ONE OR MORE DIMENSIONS: II 


where 
pln = *<* n* ] = t(p x n)[Lo Ly Boe Ly L*, 


k 
No My +++ NM (n - > ns) 
i=0 
so that, E(Y;) = 9: (4 = 0,1,---,k), E(Y*) = n*. The elements of each Y 
matrix, of course, vary between — © and ~. Integrating out over Y*, we have 
for the joint distribution of Yo, Y1, +--+, Ye, 
—p(not:+++n,) not++++n% 


(Qa) ere) Qa 5 


k 
- exp E tr ry (Y; — n)(¥; - ni} dY,---dY,, 


where, of course, E(Y¥;) = 9; and E[(Y; — :)(Yi — 2:)] = nB(p X p) for 
i = 0,1,---, k. From (2.3.4), it follows, by definition, that 
(2.3.5) Pes 2h em ee Te 

nj Ni Xr; Ai 


(2.3.4) 


for i = 0, 1, --- , k, has a pseudo-Wishart distribution with n; degrees of free’ 
dom. Also, if H(X)Q;E(X’) = O(p X p), then, since \; is a positive constant’ 
(1/A,) E(X)Q:E(X’) = E(X)LiL,E(X') = E(Y)E(Y¥;) = ami = O(p X p): 
Hence, again by definition, the pseudo-Wishart distribution of (1/);)S; 
(¢ = 0,1, --- , &) is central. Finally, from (2.3.4), we observe that Yo, ¥;, --- 
Y, are mutually independent, and, hence it follows from (2.3.5) that 


(1/0) So , —~- (1/Ax) Si 


are mutually independent. 
Hence the sufficiency of (a), (b) and (c). 
Lemma 3: lf X(p X n) has the distribution 


(2.3.6) (2n)7?”? |s\-"? |BI-?? exp[—} tr 27(X — ¢)BU(X’ — ¢’)) dX, 


, 


—O <j < wo, 


where B(n X n) and =(p X p) are symmetric positive definite, then, a set of necessary 
and sufficient conditions for So, S:,--- , Se (defined immediately before (2.3.2) ) 
to satisfy the conditions (2.3.2) and (2.3.3) is given by 


(a) Q BQ; = AQ:, i= 0, 2; trey k; 
(8)  E(X)Q:E(X’) = Op X p), i= 0,1,---, k; and 
(y) Q:BQ; = O(n X n), t#j=0,1,---,k. 


Proor: Since B(n X n) is symmetric positive definite, therefore, there exists 
the transformation 


Bin X n) = T(n X n)T'(n X nn), 





328 8. N. ROY AND R. GNANADESIKAN 


so that, 

B" oie on"? 
Writing Y(p X n) = X(p Xn) (T’)", or, X = YT", and 0(p X n) = ¢(p X n) 
(7’)“, we have the Jacobian of the transformation to be |7’|? = |B|?”. 


Then we notice that (1 / nA;) XQ, X’ = (1 / nad:) YT’Q; TY’, and, from (2.3.6), 
the distribution of Y(p X n) is 


(2x)~?”? |>|-"” exp [—3 tr =" (Y — 6)(Y’ — 0’)] aY, —2 <i; < @ 


’ 


which is of the same form as (2.3.1). Now applying Lemma 2 to the matrices 
(1 / nds) YT’Q; TY’ (notice that rank (7’Q;7) = rank (Q,;) = ni), we obtain 
that a set of necessary and sufficient conditions for these matrices to satisfy 
(2.3.2) and (2.3.3) is 


(a) T’Q,TT'Q,T = r,7"Q,T, or, Q:BQ; = AQ: (t = 0, l, Ye k); 


(8) E(Y)T’Q,TE(Y') = O(p X p), or E(X)QE(X') = O(p X p); 
(¢ = 0,1, --- , &); and, finally, 
(y) T’'Q:TT'Q;T = O(n X n), or, Q:BQ; = O(n X n), 
(ij = 0,1, *--, b). 


Hence the lemma is proved. 
Next, under the general multivariate Model II, we have noted that 


Xi 
x(pn X 1) =] : 
Te 
(Qr)~?”? |>*|7" 2 


(2.3.7) ee ra a x |] 
-exp| —=4[(mi --- xp) — E(x, --- x,)|=* > |—E| : > | dx, 
. X» } 
where 
Xi 
E(x) = E| : 
Xp 


and >* are defined respectively in (2.1.2) and (2.1.3). In order that this distribu- 
tion of X(p X n) be, essentially, of the same form as (2.3.6), we should be able 
to express the exponent in (2.3.7), except for a constant factor (—1) / 2, in the 
form 


has the distribution 


bo 


tr Mz" (X — ¢) M7’ (X’ — ¢”), 





ANOVA IN 


ONE OR MORE DIMENSIONS: II 


329 


where M,(n X n) and M.(p X p) are symmetric positive definite matrices and 


where #(X) = 


Lemma 4: A necessary and sufficient condition for this is that 


2.3.8) 


that 2* = My'-2 My 


=*(pn X pn) = Mi(n X n)-X M2(p X p). 


Proor: Sufficiency of the condition. If 2* = M,-X Mz then it is known [5] 
. Now, let 


(1) 

Mi 

(1) 

oe mei 
My (n Xn) = 


(1) (1) 
Myi2 roe Min 

a (1) 
Moz ***) Min 


(1) 
Man 


(2) 2 (2) 
mi 2 eee rev 
(2) (2) 
Moi 22 ove ms, 


ie " 


Mz" (p X p) 


Then, 


“ E(x) 
(x; — E(x), — E(x,)] =*~ 


— E(x,) 
a»? (x; — E(x;)] Mz’ 


1,j=1 


(n X n)+m$3’ [x; — E(x;)] 


x, — E(x;) 
= trM3"| | Mi" [x — E(x), «++ ,X» — E(x,)] 
x» — E(x,) 


= tr Mz’ (X — E(X)] M7" (X’ — E(X’)]. 


Hence the sufficiency of the condition (2.3.8). 
Necessity of the condition. Supposing now that 


x, — E(x) 
[x; — E(x), x, — E(x,) ]z*” 
xX» — 


= tr Mz’ [X — E(X)] M7 [X’ — E(X’)], 


E(x») 


and writing Mj’ and M7’ as before, we can argue backwards in the proof of the 
sufficiency of the condition and obtain that =*"* = M7’: X Mz’, sothat, =* = 
M, . Xx M2 . 





330 S. N. ROY AND R. GNANADESIKAN 


Hence the lemma is proved. 


Further, we can establish that for =*(pn X pn), defined in (2.1.3), under the 
general Multivariate Model II with a perfectly general design matrix, to be 
expressible as M, - X M2 we have the necessary and sufficient conditions 


Zip X p) = oi U(p X p), (¢ = 1,2,---,k), 


which yield the restricted Model II. That these conditions on 2;(p X p) are 
sufficient is easily verified. That they are necessary can be demonstrated as 
follows where, for simplicity of argument, we assume that p = 2. 

Suppose 2*(pn X pn) = 2*(2n X 2n), since p = 2, here, 


= Mi(n X n) -X M;2(2 X 2), 


where M, and M; are symmetric positive definite matrices. Then, from (2.1.3), 
we have 


AyAi -X Di + +++ + Andy -X Ue + 1 (mn) -X B= Mi -X Me. 
From this we obtain the equations 
.— ArAt [ot — exo’) + AAs [ott — e103?) + --- 
(2.3.9) ya ae “ 
A, A; [oii on C1022 | + I(n) [ou - C1029] = O(n x n), 
and 
AA; [oi) — c2032)] + AoA? foi? — coi?) + --- 


+ A.A foi: ™ C2032 | + I(n) [ou _ C2012] = O(n Xn), 


where qQ = (Me)u / (M2), Co = (Mo) F (Me), (Me) ;; is the ath (2, j= 1,2) 
element of M2(2 X 2), and where 
(i) (¢) 
19 >) F11 012 
ys ‘ a 
Zi(2 XK 2) a (i) (4) 
712 022 
for: = 1, 2,---, k and 
i 
yet. 
12 O22 
For the equations (2.3.9) to hold we must have either, 
(1) (1) (2) , (2) (k) 7 _(k) 
O11 a Ss 7 en "= eh ees = en fj fae a, 
and 
(1) , (1) (2) , _(2) _ 7% a / cae 
O33 / Cis = O13, { GOig Bees = Oy / C1: = On / On = A; 
or, 
; . pe ‘ 
A,;A; = a(n), (¢ = 1,2,---,k) 


where a; is a scalar constant. These latter conditions on the submatrices of the 
design matrix are too restrictive and unrealistic, so that, for a perfectly general 





ANOVA IN ONE OR MORE DIMENSIONS: II 331 


design matrix, the former conditions hold necessarily if [* = M, -X M:, and 
they are verified to be equivalent to the conditions, 2; = oid, fori = 1,2, --- ,k, 
where o; are certain positive constants. The proof of the necessity of the condi- 
tions for general p follows exactly along the same lines. 

We have thus set up, for reasons of easier mathematical treatment, the 
restricted multivariate Model II mentioned in section 2.1, and under this re- 
stricted set-up we have, for X(p X n), the distribution 
\(—p/2) 


k 
(2.3.10) (Qe) ?™” |x|" |S* 6? A; Ay + I(n) 


t=1 | 


k 
- exp | - str 2(X — E(X)) (> of A; A; + Hn) (x ~ E(x’) | dX , 


since |=*| = |Mi(n X n) -X M2(p X p)| = |Mi |” | Mz |” by [5). 

Next, suppose that Q; (n K n) (4 = 0,1, --- , &) isa symmetric at least posi- 
tive semi-definite matrix of rank n; (Sn) such that E(X)QE(X’) = O(n X n); 
then, we have, under the multivariate Model IT, 


A; (p Xp) = = E(XQ;X’) 
x; Q; x, SO: ,Xi Q; Xp 
1 


bagi) xo Qixi, +++ ,%2Qix, 


’ ’ 
xpQimi,-**,XpQixXp 


l k ; : 
_ P tr (A; A; Q;) 2; + (tr Q:) 2, 
’ j= 
by using Lemma 3 of [10] and simplifying, 
k 
=~ bP oh tr (Ay A} Q,) + tr Qe] B(x P), 
tLs=t 


for the restricted multivariate Model II. 

Also, if (1/A) Si (p X p) = (1/na;) XQ;X' has a central pseudo-Wishart 
distribution with degrees of freedom n; (= rank of Q;) then, under the restricted 
Multivariate Model II where X(p X n) has the distribution (2.3.10), we have 


E(S;) = sz = Ai (p X p), 
so that, from (2.3.11), 
Ni Lj=l 


rj = i > oj tr (A; A; Q:;) + tr a.| ° 


Again, under the restricted multivariate Model II, if we apply the conditions 
(a), (8), and (vy) of Lemma 3 (remembering that B(n X n) of (2.3.6) is now 





332 S. N. ROY AND R. GNANADESIKAN 


k 
replaced by >. 02 A;A; + I(n)) to the (k + 1) matrices (1/d0) So, --- 
i=l 


(1/A.) Se, and then require, as in the univariate case discussed in [10, 
Section 2.3], that these matrices satisfy the conditions (a) and (y) for all 
oi, °**, 04, we have, after some simplification, 


Q; Ai Ai Q; -™ [2 tr A: A; 0.| Q: ’ 


(2.3.13) l= 1,2,--- i” (¢ =@, 1, -- hk), 
e 1 
Q; = [2 tr 0. Q; | 
Ni } 


(2.3.14) Q:Ai:A:1Q; = O(n Xn), 1L=1,2,---,k; QQ; = O(n X n) 
fori ¥ j = 0,1,---,k. 


and 


It is seen that these conditions, (2.3.13) and (2.3.14), are exactly the same as 
those obtained for the univariate problem [Cf. (2.3.4) and (2.3.5) of [10]]. Thus, 
for a given design matrix A(n X m), the same Qo), Q:, --- , Q which satisfy 
(2.3.4) and (2.3.5) of [10] for the univariate case, also satisfy (2.3.13) and 
(2.3.14) under the restricted multivariate Model II set-up. Also, for given 
Qo, Qi, --* , Qe, the same design matrix A(n X m), which satisfies (2.3.4) and 
(2.3.5) of [10] under the univariate Model II set-up, also satisfies (2.3.13) and 
(2.3.14) under the restricted multivariate Model II set-up. 

We shall next present a tie-up, for the multivariate restricted k-way classifica- 
tion, between the analysis under Model I and the analysis under Model IT. 

2.4. Tie-up between the analyses under the multivariate Models I and II for the 
restricted k-way classification. We recall from section 1.2 that, under the multi- 
variate Model I, we can obtain k matrices due to the k testable hypotheses of 
equality of the row vectors of §;(m; X p) (¢ = 1, 2,---, k), which can, by 
analogy with (1.2.1), be written as 


Ho; : Ci((m; — 1) K m)E(m X p) 


=[(Ca Cs i, where r = rank(A) 
PNG Ny =m—k+1 
k 
(241) and m= )>.™m;, 
2.4. ~ 
10 --- 0 —-1] | | 
C1 Gl+--1@ 2 «> @ wGi Ol... fe 
= (m; — 1) : eR Lod : 
0 0 , «I | | 
m, M, <*> mM; vee mM 


ll 


O((m; — 1) X p), 





ANOVA IN ONE OR MORE DIMENSIONS: II 333 


so that rank (C;) = (m; —1) (¢ = 1, 2, --- , &). Asin section 1.2, we can obtain 
k matrices due to the k hypotheses Hu, , Hor, --- , Hox , viz., 


(2.4.2) XA,(AjA1) "Ci [Ca(ArA1) "Cal Ca(ArAr1) AX" 


fori = 1, 2, --- , k, and these matrices are symmetric and at least positive semi- 
definite of rank, almost everywhere, min(p, (m; —1)). We have, also, the matrix 
due to error 


(2.4.3) X[I(n) — A;(A}A;)A7]X’ 


which is symmetric positive definite (almost everywhere) since we assume in 
the model that p S (n — r), where for the restricted k-way classification r = 
(m —k+1). 

Now, under the multivariate Model I, in the notation of section 2.3, suppose 
we take mSo = XQ X’ as the matrix due to error given by (2.4.3) with nm = 
(n—r) = (n—m+k-— 1), and n,S; = XQ,X’, fori = 1, 2, --- , k, as the 
matrices due to hypothesis given by (2.4.2) with n; = rank (Q;) = (m; —1). 


i=0 
Notice that 2. n; = (n— 1) <n. Itisseen that these Q;, i = 0, 1,---,k 
k 
are the same as those for the univariate case [Cf. section 2.4 of [10]]. We may 
verify that all these Q,’s are such that E(X)Q:E(X’) = O(p X p) under the 
multivariate Model II and hence we can obtain that 


? > 


| 
A, = — F(XQ; X’) 
(2.4.4) ARE AS ae ee 


= »,2; 2; 
for the general multivariate Model II, where 

2 ‘sum of the elements along and below the diagonal of ) 
_ , / > 
(m; — 1) \ [Ca(Ar Ar) Cal” 


= (v0; + 1)3, 


for the restricted multivariate Model II, (¢ = 1, 2,--- , k), and 


Ao = ! E(XQ X’) = 2. 
lo 


Therefore, under the restricted multivariate Model II, we have, for the above 
Qi(n X n), that 


(2.4.5) xX, = vor + 1, ¢$=1,2,---,kand X= 1. 


If now, under the restricted multivariate Model II, we apply the conditions (a), 
8), and (y) of Lemma 3 to the set of matrices (1/A0)So, (1/A1)Si,°-°, 
(1/r,)S; , taken as above, we notice that they all satisfy (8) so that their dis- 
tributions are all central (if they are pseudo-Wishart at all). Furthermore, 
(1/Xo) So = So (by 2.4.5)), where noSo, the matrix due to error, can be seen to 
have the central pseudo-Wishart distribution (in fact, the ordinary Wishart 





334 S. N. ROY AND R. GNANADESIKAN 


distribution, since Sp is positive definite here) with degrees of freedom mm , and 
to be also distributed independently of (1/A;)S; for i = 1, 2,--- , k. Also, by 
applying (a) and (vy) of Lemma 3 to (1/A,;)S;(¢ = 1, 2, --- , k), we observe 
that they are distributed mutually independently in central pseudo-Wishart 
forms with respective degrees of freedom nm , ne, --- , nm, if and only if, 


9 > Ca(Ar Ar) "Ca ” 2 [Z(m; r 1) + J((m; _— 1) x (m; — 1))], 
(2.4.6) v; 


and 
C(AjA1) "Ch, = 0((m; —1) X (m; -1)), 
(¢#j=1,2,---,k), 


where we recall that /(p) denotes the identity matrix of order p and J(p X q) 
denotes a matrix of p rows and q columns all of whose elements are equal to 
unity. The conditions (2.4.6) and (2.4.7), which are independent of the un- 
known variance components, are the same as (2.4.5) and (2.4.6) of [10] for the 
univaiiate case. 

Recalling the remarks toward the end of section 2.3, we observe that these 
conditions, (2.4.6) and (2.4.7), are both satisfied by the multivariate analogues 
of the usual univariate complete block designs. 

Finally, it may be seen from (2.4.4) that we can take (1/v;) (S; — So) as 
an unbiased estimate of =; (p X p), fori = 1, --- , k, and So as an unbiased 
estimate of 2(p X p). We may, therefore, use c[(1/v»;) (8S; —So)] as estimates 
of c(=;) and c( So) as estimates of c(2). 

2.5 Tests of hypotheses on the multivariate variance components. The usual null 
hypotheses may be stated as 


(2.5.1) Ho: Zp X p) = O(p X p), or, equivalently, c(2;) 0, 


fort = 1,2,--- ,k. 


It is easily seen, from (2.4.4), that for the restricted k-way classification Ho; is 
equivalent to Ai(p X p) = Ao(p X p) (i = 1, 2, --- , k), or, for the restricted 
multivariate Model II, to the hypotheses \; = A» (¢ = 1, 2, --- , &). The alter- 
native to this last form is taken to be Hy;: A; > Ao . Assuming that the restricted 
k-way classification has matrices like (2.4.2) and (2.4.3) which satisfy both 
(2.4.6) and (2.4.7), we have, by definition of the pseudo-Wishart distribution, 
that XQoX’ = Yo(p X no) Yo(no X p) and XQ;X’ = Yip X ni) Y'(n: X p) 
(¢ = 1,2, ---,k), where Yo and Y; have the joint distribution 


( , , 
const. exp | -3 tr >" LS Yo Yot+ z Y; vi | dY,dY;, 
\Xo \, ) 


and where E( YoYo) = nodoD and E( ¥.7;) = nds. 
Consider a’ (1 XK p)Yo(p X mo) and a’ (1 X p)Yi(p X n;) for all nonnull 





ANOVA IN ONE OR MORE DIMENSIONS: II 335 


a(p X 1). Then, (1/no) E(a’ YoY va) = oa’ Za and (1/n;)E(a’Y.Y.a) = ),a’2a. 
For testing \; = Ao against A; > A» we may take as critical region 


, 
a’Y;Y;a/n; 


wi: F, (n,m) = >= 
ot ee S a’Vo Yjia/ no 


= F,(n; , nm), 
where F,, (n;, mo) is the upper 100a % point of the central F-distribution with 
n; and nm degrees of freedom. Taking, w; = Mw , as a critical region for the 
hypothesis (2.5.1) we obtain that 

Wit Cmax (SiSo') = Ca (p, ms, No) 


which is seen, from (1.2.4), to be the critical region of the test, at a level a*, for 
the hypothesis (2.4.1) under the multivariate Model I discussed under Section 1 
of this paper. 

It must be noted that the above arguments for deriving a test were made 
solely to obtain, for the customary null hypotheses under the restricted multi- 
variate Model II, if possible, a critical region which is the same as the one for 
the customary null hypotheses under the multivariate Model I. The use of the 
union-intersection principle to obtain w; from w,, is rather artificial since we do 
not have Ho; itself as an intersection of hypotheses Hoja . 

2.6. Confidence statements. We shall first assume that we are dealing with 
restricted k-way classifications that have matrices like (2.4.2) and (2.4.3) satis- 
fying both (2.4.6) and (2.4.7). As observed before, the multivariate analogues 
of the usual univariate complete block designs satisfy these requirements. Under 
the restricted multivariate Model II, we shall then obtain simultaneous con- 
fidence bounds on oj, --- , a; and c(Z). Next, we shall relax the condition 
(2.4.7), i.e., we shall not require that the pseudo-Wishart distributions of the 
matrices like (2.4.2) in our analysis be independent. Under this relaxation, we 
shall obtain an alternate set of confidence bounds for the individual 
o's (¢ = 1,2,---,k). 

If (1/A;) Si (p XK p) = (1/ nad:)XQ:X’, fori = 0,1, --- , k, have independent 
central pseudo-Wishart distributions with respective degrees of freedom m , n; , 

- and n, , then, by definition, we have (1/n,A;)XQ:X’ = 


(1/ni) Yi (p X nd Vi(n; X p), 


fori = 0,1, --- , k, where the joint distribution of Yo, Y1,---, Ye is 
(_k ) 
: (Qa) POPU? y/-@-P? oxy | —4h tr D7 4 Yi) |dY¥o --- d¥;, 
(2.6.1) (i=0 j 
—2x < all elements of Y; < ~, 
i=0 


and where E(Y;Y;) = nii(p X p) and > n; = (n — 1). It is well known 
k 


that, for the symmetric positive definite matrix 2, there exists an orthogonal 
matrix, '(p X p), such that =(p XK p) = I’D,T, where the p (non-zero) ele- 





336 Ss. N. ROY AND R. GNANADESIKAN 


ments of the diagonal matrix D, are the p (positive) characteristic roots of 2. 
Now making the transformation 


(2.6.2) Duy, TYilp KX ni) = Z: (p K ni), (¢ = 0,1,---,k), 
we can verify that the Jacobian is |Z|“~"”, so that, the joint distribution of 


Zo, Zi; [ers Ze is 
(2 )— lnm 1)}/2 i ie b> 5 r ‘\ r, r 
a 29) exp 2 tr¢ Z:Z;) |\dZo +--+ dZ;, 
(2.6.3) \i=o } 
—« < all elements of Z; < ~. 


From (2.6.3), it can be seen, by analogy with the methods used in [8, 9, 12], 


that we can obtain constants, wa (p, ni, ai) = wa (say) and pe(p, n;, ai) = 
bi (Say), fort = 0,1, --- , k, such that the statement 

’, ig , ig t; , 
(2.6.4) Mil s Cmin( Z:Z3) s Cmax(Z:Z;) s Me, 


has probability (1 — a;) and the probability of statements like (2.6.4) holding 
simultaneously fori = 0, 1,---,kis(1— a) = TI] ‘> (1 — a). We note that 
ZZo = (no/do)Dyvz TSo0’Dyvz , where noSo (p X p) is the matrix due to 
error given by (2.4.3) and is symmetric positive definite. Therefore, starting 
from (2.6.4), with 7 = 0, and reasoning exactly as in section 1 of [9], we obtain 
the confidence statement 
(2.6.5)  Cmaz(Se) & Cmax(Z) S Cmin(Z) ZS —~ Cumin (Se) 
Mo Mo2 

with confidence coefficient 2(1 — ap). 

Next, for any 7 = 1, 2, --- , k, we note that (2.6.4) is equivalent to 


> nN; / an nN; Lan 
(2.6.6) Ba S x Cmin( Diy TST’) § x Cmax(Dij.TST’) S pe. 


However, it is known that the non-zero characteristic roots of A(p X q) 
B (q X p) are the same as those of B(qg X p)A(p X q) and that ¢min(A1) 
Cmin (Az) S all c(AjA2) S Cmax(A1) Cmax (Az) where A; (p X p) is symmetric 
positive definite and A.(p X p) is symmetric at least positive semi-definite. 
(Cf. pp. A-5 and A-7 of [12] for proofs.] Using these two results we have 


c (S ) a c -GS;) 
: ro Vv “ min t ‘ sO Vv max 
Cmin( T'S; IY Diy) S Cuin(®) and Cmax(I'S; I’Di,,) 2 Cmaz(3) ’ 


so that, (2.6.6) implies the statement 


Ni Cmax(S;) < nj; Cmia(S:) 


. oe — 
Hie Curax (a) Mil Cmin(Z) 


(2.6.7 ) 


which, therefore, has a probability 2(1 — a;). 
Taking the statement (2.6.5) together with all statements like (2.6.7) for 





ANOVA IN ONE OR MORE DIMENSIONS: II 

, k, we obtain the simultaneous statements 
— enn! ) s Cmin (z) => Cmax z) — Cmax (So) 
Mo2 Ho 


(S min S 
MH Cmax 91) he Ss ( 1) 
Hie Cuante) Mil Cmin(Z) 


Nk Cmax( Sz) < hi < NM Cmin( Sx) 
Mee ‘Cmax(2) Mel Cmin(Z) 
with a joint probability >(1 — a) = [].o(1 — aj). 
Recalling now, from (2.4.5), that A; = voi + 1 (i = 1,2, --- , k), and using 
the leading statement in (2.6.8), we obtain the further statements implied 
by (2.6.8) 


™ 9 . . mM ¥ 
— Cmin (So) 4 Cmin(Z) = Cmax(2) s => Cmax (So) 
oo Ho 


1 | me Hol Cmax(S1) ‘ 1] 2< 1 1 | mo Hoe © Cmin(S1) -1| 
(2.6.9) No Mi2 Cmax (So) fs ™ To Mi Cmin(So) 


1 |” Moi Cmax (Ss) _ 1] s =< i | moe Hoe Cmin( Se) _ | 
Ve LM Mee Cmax(So) No Mei Cmin(So) 


Ww ‘hich, therefore, are a set of simultaneous confidence bounds on oj, 02, °-* > 
ok 2 and all c() with a joint confidence coefficient > (1 — a), fora preassigned a. 
These simultaneous confidence bounds on the set of c(Z) and oj,---, o 
are obtained on the assumption of independence between (1/A,)S; 
(fori = 1, 2, --- , k). If this assumption were relaxed we would still be able to 
obtain individual confidence bounds on oj, --- , o% and the set of all c(2), 
although the simultaneous confidence bounds in this situation would be far more 
difficult to obtain. 

We shall next obtain the alternate set of separate confidence bounds for 
gi, os, °° and o. ; 

If Yo(p XK mo) and Y,;(p X n,;), where p S nm but may be = n;, are such that 
rank (YoYo) = pand rank ( Y.¥;) = min(p, n;), and further if Y>) and Y,; have 
the joint distribution 


(2.6.10) (Qa) ~{Pirornoli2 | s | —Crotn?? exnl_ a tra YoYo + Yi¥i}]d¥o Ws. 


where =(p X p) is symmetric positive definite and E( YoYo) = nod, E(Y; Y; 

n,=, then, Rao [7], in continuation of the work of Bartlett and W ald, has aes 
that, for large m; , —m; log,A; has the central x’-distribution with pn; degrees 
of freedom, where 


(2.6.11) <A; = Yr mye vm Y" and m; = | m +n — r 5 + ‘|. 





338 S. N. ROY AND R. GNANADESIKAN 


Hence, we can find xi; and x2«, such that the statement 
(2.6.12) Xie, S —m, log.Ai S xe; 5 
or, equivalently 


Mia; S A; s Mea; 5 [u18; = exp( —}x2e;), Mea; = exp(—3xie;)}, 


has a probability (1 — a;) for a preassigned a; . 

Under the restricted multivariate Model II, for a restricted k-way classifica- 
tion, if we take the matrices (1/do)So, (1/A1)Si, --+ , (1/Ae) Sy as in Section 
2.4, then we have seen that (1/0) So is distributed in the central Wishart form 
with no degrees of freedom and it is distributed independently of (1/A1) 8: , 
(1/Ac)S; . We have, also, by (2.4.5), that A; = ve: + 1 (¢ = 1,---, k) and 
Ao = 1. Further, if (1/A,)S:, --- , (1/Ax) Se satisfy the condition (2.4.6), so 
that they have central pseudo-Wishart distributions with degrees of freedom 
m,°** , Mm (even though these may not be independent), then, by definition, 
we can write noSo = Zo(p X no)Zo(no X p) and 


ot 


(n:/A:)S: = Zip X ni)Zi(n; = @). 


The joint distribution of Zp) and Z; is then of the same form as (2.6.10), and, by 
analogy with the statements (2.6.10) — (2.6.12), we ean find, for large m; , 
constants (depending on a central x’-distribution with pn; degrees of freedom), 
Mia, and poe, , Such that the statement 


No So| 
< Ey es 
(2.6.13) eis ake ROTATOR Foals 
Tio So +- i, Ss 
or equivalently, 
5 a) 1 — gas t§ 
(2.6.14) > lt: 8S; So' + I(p)l 2 ; 
Fla; Mea; 
where ¢; = (n;/noA;) > O, will have a probability (1 — a,). If rank (S,;) = 
min (p, n:) = 8; (say), then (2.6.14) is seen to be equivalent to 
= (gi)"* tre,(S; So") + (gi) tre, Si So") + + 
Fla; 
(2.6.15) 
_— l 
+» +g; tr(S; So’) +12 —, 
Mea; 


where tr,(A) denotes the sum of all sth order principal minors of A. Using 
certain matrix factorization theorems given in [12, pp. A-15-A-17], we can 
prove that tr,(.S,;So’) = tr, [a symmetric at least positive semi-definite matrix] 
> Ofor s S s;. Hence, all the coefficients of powers of ¢; in the middle part of 
(2.6.15) are real and positive. Next, since |t;S;So' + I(p)| > 1, in order that 
the bounds in (2.6.15) may be non-trivial, we should have 1/y2., > 1. 





ANOVA IN ONE OR MORE DIMENSIONS: II 339 
Considering now the equality signs in (2.6.15), we obtain the equations 


1 
(g:)** tr, CS; So") + +++ + & tr (S; So") — (, . ) i) 
2a; 
(2.6.16) 


: 7 “i l 
(¢ _ tr.,(S; So’) + = + oi tr (S; So’) — e a 1) 0. 
la; 


rom well-known results in the theory of equations, it now follows that the 
equations (2.6.16) each have one and only one positive real root. Let these posi- 
tive real roots be denoted by @2.; and 6:4;. Then it is seen that (2.6.14) or 
(2.6.15) is equivalent to 


(2.6.17) Dia; = $1 = Ora; 


with a probability (1 — a;). Recalling that ¢; = (n;/ mdx) and Ay = vo; + 1, 
we see that (2.6.17) is equivalent to the confidence interval statement 


’ l . - . 
(2.6.18) |. ni i] seis al n --1] 
Vi Lo Ire, Vi Lo Ora; 


with a confidence coefficient (1 — a;), for a preassigned a; . We thus have, for 
i = 1, 2, ---+ , k, separate confidence interval statements for each of oj , 02, --- 
and «i , but, due to the complexity of the distribution problem involved, it would 
be far more difficult to obtain simultaneous confidence bounds on oj, o2, --- , 
and o; by this method. Nor would the difficulty be appreciably reduced, under 
this approach, even if we assumed that (1/A,)S,’s (for 7 = 1, 2,---,k) were 
independent as we did under the first approach. 

2.7 Concluding remarks: After the work presented in this paper and in [10] 
had been completed, it was brought to the attention of the authors that Bose 
[1] has, for the univariate case, given a general treatment, using slightly different 
methods, of a mixed model with one set of random components. A very recent 
paper by Zelen [13] also has some results, for the univariate case on a mixed 
model with one set of random components as applied to Incomplete Block De- 
signs, which are contained in [10]. 


REFERENCES 


[1] R. C. Boss, ‘‘Versuche in Unvollstandigen Blécken,’’ Gastvarlesung, 3 bis 11. Miirz 
1955, Universitat Frankfurt/M., Naturwissenschaftlische Fakultat. 

{2} O. Carpenter, ‘‘Note on the extension of Craig’s theorem to non-central variates,’’ 
Ann. Math. Stat., Vol. 21 (1950), pp. 455-457. 

[3] W. T. Feperer, ‘‘Testing proportionality of covariance matrices,’’ Ann. Math. Stat., 
Vol. 22 (1951), pp. 102-106. 

[4] R. GNANADESIKAN, ‘“‘Contributions to multivariate analysis including univariate and 
multivariate variance components analysis and factor analysis,’’ Institute of 
Statisties, University of North Carolina, Mimeo. Series, No. 158 (1956). 

[5] C. C. MacDurrgg, The Theory of Matrices, J. Springer, Berlin, 1933, pp. 81-88. 

[6] J. Ogawa, ‘“‘On the independence of quadratic forms in a non-central normal system,”’ 
Osaka Math. Jour., Vol. 2 (1950), pp. 151-159. 





340 S. N. ROY AND R. GNANADESIKAN 


(7} C. R. Rao, ‘Tests of significance in multivariate analysis,’’ Biometrika 35 (1948), 
pp. 58-79 
&| S. N. Roy anv R. C. Boss, ‘Simultaneous confidence interval estimation,’’ Ann. 
Math. Stat., Vol. 24 (1953), pp. 513-536 
9} S. N. Roy anp R. GNANADESIKAN, “‘Further contributions to multivariate confidence 
bounds,’’ Institute of Statistics, University of North Carolina, Mimeo Series 
No. 155 (1956). 
10} S. N. Roy anp R. GNANADESIKAN, “Some contributions to ANOVA in one or more 
dimensions: I,’’ Ann. Math. Stat., Vol. 30 (1959), pp. 304-317. 
[11] S. N. Roy, ‘The individual sampling distributions of the maximum, the minimum and 
any intermediate one of the p-statistics on the null hypothesis,’’ Sankhya, Vol. 7 
(1945), pp. 133-158. 
[12] S. N. Roy, “‘A report on some aspects of multivariate analysis,’’ Institute of Statistics, 
University of North Carolina, Mimeo Series No. 121 (1954). 
(13) M. Zeven, “The analysis of incomplete block designs,’’ J. Amer. Stat. Assn., Vol. 52 
(1957), pp. 204-217. 





UNBIASED ESTIMATION: FUNCTIONS OF LOCATION 
AND SCALE PARAMETERS 


By R. F. Tare! 
University of Washington 


1. Summary. Unbiased estimators for functions of a location parameter 
6 and a scale parameter p are expressed as unknown functions in integral equa- 
tions of convolution type, and are then obtained by integral transform methods. 
An outline of the paper is contained in Section 3. The main results consist in the 
application of various derived expressions to the exponential distribution with 
parameters @ and p, the gamma and Weibull distributions with parameter p, 
and to general distributions with truncation parameter 6. In the latter case, a 
simple formula is given for a minimum variance unbiased estimator of any ab- 
solutely continuous function of @; this extends slightly a result of Davis [3| 
concerning distributions of exponential type. Throughout the paper particular 
attention is paid to the estimation of the probability that a single observation 
will lie in a certain Borel set, when this probability is regarded as a function of the 
parameters @ and/or p. Extensions to sample points of m observations and Borel 
sets in m-space are in most cases immediate. 


2. Introduction. Estimation of location and scale parameters was first studied 
systematically by Pitman [15] through the use of fiducial functions. He showed 
that for a random sample X,, X:,--- , X, from a density f(a — @) the esti- 
mator 


| or Xi — 0)f(X2 — 0)---f(X, — 6) dé 
(2.1) @(X1, X2,°--, X.) = ——_—_—_—_—__ . . 


[ x, ~ 9)f(X2 — 6)+++f(Xn — 6) do 


has minimum variance among the class of all estimators U(X, , Xz, --- , Xn) 
with the translation property, that is to say estimators satisfying the condition 


(2.2) U(Xi +e, X2+¢,-:+,Xa+e) = U(X, X2,---, Xn) +e 


for all real c. This was his main result concerning unbiased estimation. The 
estimator for p, when the X,’s have density pf(px), 
x. 
f ( -}dp 
\P 


X Xe 
J ur()r() 
(23) (Mi, Xs, +++, X) = —* Ae : 
i) ((8) 
Jp p p p 
was shown to have a negative bias, although it possesses optimal properties 


Received September 2, 1958; revised January 12, 1959. 
1 Research sponsored by the Office of Naval Research. 


341 





342 R. F. TATE 


among the class of estimators with the multiplicative property, that is among 
those estimators satisfying the condition 


(2.4) idl, ake, -+-. Gahan) © OUR, Be, --*, Xe) 


for all positive c. Sufficient statistics’ need not exist in order to apply his results, 
although their existence simplifies the labor; this is also a feature of the present 
paper. Pitman’s work has been extended by Girshick and Savage [6] and others 
in the direction of minimax estimation. 

The aim of the present paper is to consider only unbiased estimators, but to 
allow the functions to be estimated to be of a more general nature than those 
which Pitman considered. The methods which will be used are related to those 
of Washio, Morimoto, and Ikeda [16], in that they also use integral transform 
theory to obtain their results. Washio et al deal with the Koopman-Pitman family 
of densities (see Koopman [12], and Pitman [14]) which possess sufficient sta- 
tistics, and for which the range does not depend on the parameter. The joint 
density for n independent random variables from a density of this type can be 
expressed as 


Zn) +U (zy 22 ‘on 


' > B(r)T (xy ,29,- 
p(t, %2,°**,%n|7t) = K(r)e 7" 


where 7 is a real parameter, 7’ is a sufficient statistic for +, and the positive 
sample space is independent of r (except possibly for a set in n-space with Le- 
besgue measure zero). Now, the density of 7’ can be expressed as 


flr) = k(nyet™, 


where the positive sample space of 7 is also independent of r. Theorem 1 of 
Washio, Morimoto, and Ikeda [16] gives an estimator for a function m(r). 
This estimator may be denoted by ¢(7'), where 


(2.5) | (oltre )e* dt = MT) 
k(r) 

with integration taking place over the positive sample space of 7. Two alterna- 

tive sets of conditions restricting the functions m(7r) and k(7) are contained in 

the hypothesis of their Theorem 1. 

At this point it is possible to discuss the differences between their methods and 
those of the present paper, with respect to distributions which possess a single 
parameter. First, the family of densities which they consider is broader than the 
one we consider in that 7 can be any parameter, instead of a parameter of loca- 
tion or scale, but narrower by virtue of the fact that we shall treat cases in which 
the range of the density depends on the parameter. Secondly, their hypotheses 
are designed to insure that m(7r)/k(7) is itself a bilateral Laplace transform, 


. . yf . . s . *a° 
since in that event ¢(t)e"’” is determined (up to a set with probability measure 


2 By the existence of sufficient statistics is meant the existence of a small number of 


‘“‘simple’”’ continuous real-valued functions of the sample that are jointly sufficient in the 
Halmos-Savage sense. 





LOCATION AND SCALE PARAMETERS 343 


zero for all r) by inversion, and then the unbiased estimator obtained from it after 
multiplication by e *‘”. In contrast to this we shall in each case express our 
estimator ¢(7') as a simple transformation of the unknown function of some 
integral equation of convolution type. The function m(7r)/k(7r), or some simple 
transformation thereof, will then be required to possess a Laplace, bilateral 
Laplace, or Mellin transform, depending on what type of parameter is studied. 

Side results in reference [16] concerning the actual process of inversion are 
offered, parallel to the statistical development. These results depend on the 
concept of a bounded linear translatable operation. This notion was dealt with 
by Kitagawa in a series of papers, including one (see Kitagawa [10]) which 
refers directly to the work in reference [16], and is also of independent interest in 
operational calculus. 


3. Notation and outline of results. Throughout, X will denote the basic 
random variable, and X,, X.,---, X, will be a random sample from the X 
distribution. Xs and X, will denote the smallest and largest observations, re- 
spectively, in such a sample. The following notation will be used for densities: 

A density of X which is completely specified will be denoted by f(z). 

For those situations in which the density of X has a location parameter 6 we 
denote the density by fo(1). Whenever the location parameter is actually a 
truncation parameter we will let 


fo(x) = ky(0)hy(x) @<r< wO,—-e COOK @; 


fo(x) = ke(@)he(x) —-*e* <zr<0—-~ <O0< oa. 
For the case of a translation parameter the notation will be 


fo(x) = f(x — 6). 
A density of X with a positive scale parameter p will be written as 
f,(2) = pf(px). 
lor the two parameter problems we have the notation 
foo(x) = pf(p(x — @)). 


Throughout, ge(x), g,(x), and ge,(x) will denote densities of statistics. 


ee 
In the scale parameter case one more often sees the notation - f|-}, as for 


pe’ \p 
example in Pitman’s paper previously referred to. It is, however, slightly easier 
to cast the estimation problem into the framework within which Mellin trans- 
forms apply when the above form is used. 
At various stages the Laplace, bilateral Laplace, and Mellin transforms will 
be used. The symbol x will generally denote the argument of a function to be 
* All densities will be assumed positive in some interval of the real line, and moreover 
piecewise continuous in the interior of that interval. Also, a density specified analytically 
over part of its domain will be assumed to vanish over the rest of its domain. 





344 R. F. TATE 


transformed, while s will be used for the argument in the transform of a function. 
Thus, for a function ¢(2) we shall write 


Lie(a); s] -/ e *¢(x) dz, Re (s) > 8; 


+0 


Ble(x); s] = | e “*¢(x) dz, & < Re(s) < 8; 


Me (a); s] [ x’ '¢(z) dz, S& < Re(s) < g. 
0 

For the inverse function associated with a transform 6(s) of one of the above 
types we shall write, respectively, 


¢"[5(s); a], B[5(s); 2], M[5(s); a], 


all functions being unique up to a set of Lebesgue measure zero. Analytic ex- 
pressions for the real and complex inversion of the above transforms may be 
found in Widder [17]; however, they will not be used here since all problems 
considered can be solved by the tables of the Bateman Manuscript Project [1]. 

Finally, ¢(X, , X2,--- , X,) will denote the unbiased estimator in a given 
situation; if @ depends explicitly on some statistic T(X,, X.,-:-, X,), the 
expression $(7') will be used. The function of 6 and/or p which is to be estimated 
will be written as (6, p). As one example consider the formulation used in Sec- 
tion 2 to introduce some of the results in reference [16]. There we have 


(3.1) g(r) = g(r) =e OB ke iz]; 

(r k(s) 
the estimator is then expressed as ¢(7') since in their case it depends explicitly 
on the sufficient statistic T(X,, X2,---, Xn). 

Section 4, on the estimation of functions of a scale parameter p, begins with 
some simple observations concerning densities with scale parameters and sta- 
tistics which are homogeneous functions of the sample. A formula is then derived 
which provides unbiased estimators for many functions §(p) with Mellin trans- 
forms. The result is specialized to the case §(p) = P(X ¢ A | p). All estimators 
depend explicitly on some homogeneous statistic H(X,, X2,---, Xn). 

Densities which may be factored into a product of a function of @ and a func- 
tion of z, with the range having @ as one endpoint, are very common in applica- 
tions. These are the densities which are referred to as having a truncation 
parameter 6. Either Xs or X, will be a sufficient statistic. Section 5 contains the 
derivation, and some applications, of results which provide minimum variance 
unbiased estimators for a wide class of functions £(@). The formulas used in this 
section are extremely simple, due to a fortunate relationship involving condi- 
tional expectations, requiring only a single differentiation for their application. 
The question of the estimation of @ for the cases of truncation at either or both 
endpoints of the range of a density of the Koopman-Pitman family was investi- 





LOCATION AND SCALE PARAMETERS 345 


gated by Davis‘ [3] in 1951. Our work constitutes a simplification and general- 
ization of part of his study. Connections are discussed at the end of Section 5. 

Section 6 is devoted to the case of a translation parameter 6. For those situa- 
tions in which the density of X is positive over (— ~, ~) or (6, ~ ) the derived 
estimators have the translation property. In the former instance the problem 
can be handled by the methods of reference [16] whenever a sufficient statistic 
exists; this is not so for the latter case. It is shown that when the positive sample 
space is (6, b), for some fixed b > 6, there exists no unbiased estimator with the 
translation property. For this case a formula is provided for an unbiased esti- 
mator of 6 based on a single observation. Results are in general less satisfactory 
for the translation parameter case because of the lack of sufficient statistics; 
Pitman’s estimator for §(@) = @ is usually difficult to compute. Kolmogorov 
[11] derived the minimum variance unbiased estimator for P(X ¢ A | @) in the 
case X = 2(0, o>). The estimator was obtained again by Washio et al [16] as 
an application of their theorems; they also derived the minimum variance un- 
biased estimator of [P(X ¢ A | 6)}". We shall use this example as one of the 
illustrations of our methods, and point out that it is a special case of a slightly 
more general result concerning stable laws. Still another way of arriving at 
Kolmogorov’s result has appeared in the literature as one of the cases considered 
by Hirshman-Widder [9]. Their approach is briefly described. 

In Section 7 almost all results are applied to the very important cases of the 
exponential, gamma, and Weibull distributions. Some of the applications are 
presented in the framework of a recent paper by Birnbaum and Saunders [2] 
which clarifies the need for the latter two distributions in Life-Testing. The ex- 
ponential case has been studied by Epstein and Sobel [4]; they consider esti- 
mators based on the first r of n ordered observations. In the present paper, 
minimum variance unbiased estimators are found in _ particular 
for [P(X ¢ A) | p]", m = 1, 2, --- , nm — 1, in the gamma and Weibull cases and 
[P(X ¢ A | 0, p)]", m = 1, 2, --- , in the exponential case. It is shown that some 
of the calculations can be carried out with the Table of the Incomplete Beta 
Function [13]. 


4. Unbiased estimation for &(p). Let X be a random variable with density 
f.(x) = pf(px). We wish to find an unbiased estimator for a function &(p). First 
we shall make a few observations which justify the use of homogeneous statistics 
in problems of this kind. 


If H(X,, X2,--- , Xn) is homogeneous of degree a # 0, with density g(r) 
when p = 1, then for all values of p, H(X,, X2, --- , X,») has the density 
p'g(p'x). This follows immediately from these considerations: the conditional 
density of H(X,, X2,--- , Xn), given p = 1, is the same as the unconditional 
density of H(pX,, pX2,---, pXn). Thus, p “H(pX,, pX2,---, pX,) must 
have density p*g(p"r). 


‘ The author would like to thank W. Hoeffding for bringing the paper of R. C. Davis to 
his attention. 





346 R. F. TATE 


If we restrict our attention to estimators depending explicitly on some ho- 
mogeneous function H of degree a, for example’ Xs, > >X;, [[X;, then the 
problem of estimating §(p) becomes that of solving the integral equation 


a 


(I) [ o@e%0(6"x) ax = eo) 


for the unknown function ¢, and then expressing the estimator as 


o(H(X,, X2,---, Xn)). 


*? 


Basic integral equations will be designated by roman numerals. 

Since the case §(p) = p’ can be handled in an elementary way, it will be treated 
first. A family of estimators is provided by the following theorem, the proof of 
which is immediate. 


THEoreM 1: Jf X has density pf(px), and H(X,, X2,--- , Xn) is a homoge- 
neous function of degree a ~ 0, then an unbiased estimator is provided for 
E(p) = p 
by" 
cee _ 
¢(H) = 


E\(H~'*)’ 
for all values of r and a for which the indicated expectation exists. 

Proor: Let H have the density p*g(p*x) guaranteed by the preliminary state- 
ments in this section. Then, consider the integral 


(4.1) [ x *9%g(p*x) dx. 


The stated result follows after the substitution y = p*v. 
We may wish to estimate the 100 pth percentile of the X distribution. This is 
equivalent to setting 


by 


bp 
(4.2) (p) = —, | f(a) dx = p. 
p L. 20 


wtr 


The quantity b, is usually available in a table. The value of the theorem stems 
mainly from the fact that sufficient statistics frequently have the homogeneity 
property; for example, H(X,, X.,---,X,) = >. Xf is sufficient for p in the 
Weibull density with (known) exponent a, namely 


a a—l —(pzr)* 


f,(z1) =apxr e : o <2 < «,8 < eo < o,e@ 2 i. 


Subject to certain, not too stringent, conditions the integral equation (1) 
can be solved by use of the Mellin transform. The exact result is given by 
THEOREM 2: Let X have density pf(px), and H be a nonnegative, homogeneous 


’ Summations and products without indices will be assumed to have index running from 
l ton. 

® The symbol E, stands for expectation under the condition that the parameter value is p. 

7 In order to use the classical Mellin transform it is necessary to consider either a non- 





LOCATION AND SCALE PARAMETERS 347 


statistic with density p*g(p*x). Assume that both g(x) and &(p) have Mellin trans- 
forms. If there exists an unbiased estimator ¢(H) for which the Mellin transform 
exists, then it is determined uniquely by 


1 la 
1 M |? e(a"! | 1 
oH) = 7 | —Re@id 1H 
Proor: Consider equation (1). Replace p by p”“, and then make the variable 
change y = 1/x, which produces the equation 


. Y ee 9 a ee 
(P) I 5 #(5)0(2) = ay = = elo 


This is a well-known expression of convolution type. Application of the Mellin 
transform to both sides yields 


(4.3) mM E ¢ (‘) ; | Mig(x); s] = M ? ea | , 
= x x 


from which the conclusion follows. 

In view of the importance in applications of estimating §(p) = P(X €A | p) 
the result concerning this function will be stated as a corollary. The function 
£(p) = pf(pz), for fixed z, is easier to work with in most cases; from an unbiased 
estimator of it one can derive an unbiased estimator for P(X ¢ A | p) (Kol- 
mogorov [11], page 22). 

Coro.uary: Let the conditions of Theorem 2 be satisfied for the functions 
H(X,, Xo, --- , Xn) and &(p) = pf(pz), where f(x) is a density which vanishes 
for negative x, and z is a fixed positive number.® If there exists an unbiased esti- 
mator of §(p) with a Mellin transform, it will be given by 


) = {gp | DlS(z); a(s — 1) + | 
¢(H) H M l 22's DHIMN g(x) ; s} "H , 


Proor. The result follows from Theorem 2 and the fact that 


M |? &(x''*); | = Mx! "f(a''*z); 8] 


= Mif(x); als — 1) + 1 


gas-)tl 

negative or a nonpositive statistic H, so that its density will vanish on the left or right half 
of the reals respectively. The proof is conducted for nonnegative H; the obvious modifi- 
cation consisting of replacing H by —H and g(x) by g(—2) will provide the result for non- 
positive H. 

8 If the corollary is stated instead for a density f(z) which vanishes for positive z and for 
a fixed negative number z, then the conclusion as stated can be modified by replacing f(z) 
by f(—z) and z by —z in order to give the desired answer. The reasoning is the same as in 
footnote 7. Note also that the assumption of nonnegativity for H was carried over from 


Theorem 2, but can be removed in favor or nonpositivity by the alteration described in 
footnote 7. 





348 R. F. TATE 


Once we have an estimator for pf(pz) for fixed z, estimation of P(X € A | p) 
can be performed if an interchange of limits of integration is valid. More pre- 
cisely, denote the estimator above by ¢(z|H), to emphasize the dependence 
on z. We always have 


(4.5) P(X e€ Alp) = / pf( pz) dz 


| [ o(z| x)p"g(p'x) dx dz. 

4 4p 

If now, for example, ¢(z | «) 2 0 for x 2 0 and z ¢ A, which is usually the case, 
A 


(4.6) (1) = | o(2| AH) dz 

will be an unbiased estimator for P(X ¢ A | p). This is a consequence of Fubini’s 
theorem. The method can also be replaced by the following scheme, if so desired. 
We can consider the unbiased estimator for P(X S z| p), for fixed z, which can 
usually be found in a manner similar to our derivation of ¢(z | H). Call this esti- 
mator $*(z|H). Then, P(X ¢ A | p) will be estimated by 


(4.7) o.(H) = / do*(z|H). 
A 


This will be required later on when we speak of the location parameter situations. 
Thus, it will be seen in Section 7 that the gamma and Weibull distributions with 
scale parameter p can be treated by the Corollary, while in Section 5 all trunca- 
tion parameter problems will be handled by the Stieltjes integral method. 
Finally, it should be noted that the probability that (X,, X2, --- , X.,) lies in 


a set A,, in m-space can be estimated by the ordinary integral of the estimator 


(21, 22, °°: , 2m | H) of the function p”f(pz.)f(pze) --- f(p2m), or the Stieltjes 
integral of the estimator $*(z, 22, -:-, 2m|H) of 
P(X, & hs Xe Se Be fite oo he % fn | Pp), 


over the set A,,. In connection with this paragraph see Kolmogorov ({11], 
Section 8). 


5. Unbiased estimation for functions of a truncation parameter. Following the 
notation of Section 3, we define two types of densities with truncation parameters. 
We include densities over any range (a, 5), finite or infinite. 

Type I: fo(x) = ki(O)hi(x), a8 < 2 <6; 

Type II: fo(x) = ke(@)he(x), aie <6 < &. 

In connection with these densities we assume the following: h;(xz) and he(x) are 
nonnegative, continuous, and integrable over (6, b) and (a, 6), respectively, for 
6 in (a, b). k,(@) and A(x) on the one hand, and k.(@), he(x) on the other, have 
the obvious relations 


(5.1) ky(@) = Biniey k(@) = mad) Braet 


b ? 


6 
hi(x) dx [ heo(ax) dx 


“Af “a 


fora <@< 6b. 





LOCATION AND SCALE PARAMETERS 349 


Any completely specified density f(2) defined over (a, b), which satisfies the 
continuity and integrability conditions imposed on A;(x) and he(x), will be of 
Type I or Type II as soon as a truncation parameter @ is introduced. Since 
(a, b) may possibly be (— «, +), the situation is one of some generality. If 
a random sample X,, X:,---, X, is considered, then Xz is sufficient in the 
Type I case, and X, is sufficient in the Type II case. The purpose of this section 
is to derive simple expressions which lead to minimum variance unbiased esti- 
mators for a wide class of functions £(6). Results will be stated for a finite inter- 
val (a, 6) and then amended to take care of the infinite case. 

THEOREM 3: Let X have a density f(x) which is of Type I over some finite interval 
(a, b), and &( 6) be a function which is absolutely continuous over (a,b). A minimum 
variance unbiased estimator for £(0) is given uniquely, by 


(Xs) 


(X. = E( Ka) ——oooo SS . 
Chee? SRS). 7, Tai) 


Proor: The density of Xs is easily shown to be 


b n—1 
pe(x) = n[ky(@)\"hi(x) (/ h(t) at) : @<-z¢ 


Completeness for the family {pe(x)} is equivalent to the proposition 


6 b n—-l 
| v(x) hy( x) (/ h(t) at) dx =0 


implies ¥(x) = 0 a.e. This follows from a well-known result in measure theory 
due to Lebesgue. We now obtain a simple unbiased estimator in order to apply 
the Rao-Blackwell-Lehmann-Scheffé method on the sufficient statistic Xg. 
The hypothesis allows the equation for a single observation, 


b 
i W(2x1) ky (0) hy (a) dx, = £(6), 
6 


to be differentiated with respect to 6. This results in the relation 


E(x) ki(ar1) — ker(xi)é' (21) 


5.2) a) = Daa) 
(5.2) (a1) [Kx(as)/*ha(ar) 


One can easily show that 
b 
1 Nes: (21) ky(0)hy(21) dx, 
(Xs) = ElW(X,) | Xs] = —-W( Xs) Site Nee aetna nivel 
n 


js a 
| ky(@)hy(2,) dx, 
Xs 


Recalling the relation between k,(6) and h,(2), we see that the ratio of integrals 
in the right member is (Xs). From this fact and the simplifying relationship 
ki( Xs) (ki( Xs) Phu Xs) = 1, the stated conclusion now follows. 

XEMARK |: For the case b = « th2 same result is obtained if integrability is 





350 R. F. TATE 


postulated for [E(21) — ki(21) &’(a:)] / hi(x,) over every subinterval (@, © ); this 
holds also fora = —o, 

Remark 2: If the conditions above are satisfied for finite a and 6, and the 
corresponding estimator, which we shall now denote by ¢(Xs), is bounded by 
an integrable function G(X s) which has a finite second moment, the minimum 
variance unbiased estimator for the (a, ©) case can be found by computing 


gu(Xs) = Lim (Xs), 


whenever this limit exists. This follows from the fact that 


i és(z)nha(z) ( “inl t) a) dx = £(6) (/ y(t) at), 


since when b > ~, the right member approaches’ £(8) / [kio(@)]" for each @ in 
(a, ~), and Lebesgue’s dominated convergence theorem applied to the left 
member yields 


r gu (2)nlkye(0)]"hy (2) (f hy(t) a) dx = £(6). 


EXAMPLE 1: Consider a population of incomes, all of which are at least equal 
to a certain (unknown) minimum @, but at most equal to a certain (known) 
maximum b. Let £(@) = @. An individual income might reasonably be assumed 
to follow the truncated Pareto law with density 


oe OL 0<6<xr<b. 


H > 
Fah 
In this example 
6 rs l 
ky(@) = —————, hy(x) = “= 
oe 6 = 
b 
An application of Theorem 3 shows immediately that 
, ; » > X . 
(5.4) o( Xs) = Xs — I Xs € = x), 0 . Ae = b, 
n 


is the desired estimator. Notice that h(x) is not integrable over (0, 6), and that 
this is not required. Remark 2 applies here, and 


goua( Xs) => Lim d( Xs) = x.(1 —_ =) 
b+ 


n 


is minimum variance unbiased for the density fe(x) = 0/2°,0 <0<a2< ». 
For a Type II density there exists an entirely similar theorem; statements 

analogous to remarks | and 2 also apply. We state the result for completeness. 
TueoremM 4: Let X have a density fe(x) which is of Type II over some finite 


®* The symbol k,,,(@) denotes the limit of k,(@) asb— ~. 





LOCATION AND SCALE PARAMETERS 351 


interval (a, b), and &(@) be a function which is absolutely continuous over (a, b). 
A minimum variance unbiased estimator for §(8) is given uniquely by 


oe ¥(X:) 
6(X) = €X,) + Poe. 


EXAMPLE 2: For the Pareto law in Example 1 we can change the assumptions 
to a (known) minimum income a, and an unknown maximum 6. Then, the 


density becomes 
; (2) 
0 
filx) = = 


and the corresponding estimator for @ is 
= & , , ly . 
(5.5) o(X_) = x, +} x, (%+— ), a< XxX, — a. 


Note that if a = 0, fo(x) is no longer a density. The estimators for P(X ¢ A | 6) 
are given for both types of densities by the 


Coro.iary: Let X; , X2, --- , X, be a random sample from a density of Type I 
or Type II, and let 


&(6) = P(X S2| 8), 


where a < z < b. Minimum variance unbiased estimators obtained from Theorems 
3 and 4 are 


Type I: 
a<z< Xs < b, 


—— a<Xs<z<b. 


) I. hy(x) dx 


) [ rate) dx 
Sm 8 cnn 


o*(z|X,) =4 : a<z<X.<b, 
| 


XL 
I ha(2) dx 
(1, a<X,<2z<b. 


We have the case mentioned, following the corollary to Theorem 2 in Section 4, 
in which a Stieltjes integral is required to estimate P(X ¢ A | 6). Both of the 
above estimators are mixed distribution functions with z as the variable; a mass 
of 1/n exists at z = Xz in the first case and at z = X, in the second case. The 
following expressions can be written for the estimators, and cover all cases. 
Details will be omitted.”° 


1 The symbols A (Xz, b) and A(a, X_) denote set intersections, and 74 is the character- 
istic function of the set A. 





352 R. F. TATE 


Type I: 
1 1 I nny ae 
ga(Xs) = —Ia(Xs) + (1 — 5) Ash) 
Ae) dx 
Xs 
Type II: 
I 1 I As) & 
6.(X1) = 11s) + (1 - 1) 49» —_ 
he(x) dx 
Estimators can also be obtained for P((X;, X2,--+, Xm) €Am| 6); see the 


last paragraph of Section 4. 

It is noteworthy that in these problems the distribution function of X is con- 
tinuous, but the estimator of P(X s z| @) regarded as a function of z is the dis- 
tribution function of a mixed distribution. 

Davis [3] has looked into the question of estimation of a parameter @ in a 
truncated distribution. He assumes a distribution of the Koopman-Pitman 
family, with truncation points a(@) and b(6@) which are continuous when regarded 
as functions of 6. Restriction to a subfamily of distributions for which Xs and/or 
X, are sufficient takes his density into the factored form considered in the present 
section. The main part of the paper concerns results regarding a single sufficient 
statistic. He shows among other things that in those cases for which a “single” 
sufficient statistic (one ‘simple’ function—see footnote 2) exists, one of the end- 
points is a monotone decreasing function of the other; this extends the work of 
Pitman [14]. 

The point of contact with our work occurs in his section 4 when he estimates 
6 under the condition that either a(@) or 6(6) is identically constant. In that 
event X, or Xz, respectively, is a sufficient statistic for 6. As an example con- 
sider the case which he discusses of a density which is positive over the range 
(a, b(@)), with 6(@) a monotone function. Suppose one wishes to estimate £*( 6) 
instead of just @. Then, let £(@) = &*(b-'(@)). Theorem 4 yields the estimator 

d «(B71 
itil ax, Ph OOE)) 
o(X,) = &*(b (X,)) + “ake XX) 
which, for the case —*(@) = 6, reduces to formula (8) of [3]. 

Hypotheses for our Theorems 3 and 4 are expressed somewhat differently than 
Davis’ conditions, and in our derivation the Rao-Blackwell theorem occupies a 
more central role (see expression (5.3) and the remarks following). 


6. Unbiased estimation for functions of a translation parameter. This section 
is divided into two parts; the first deals with densities having the form 


fo(x) = f(x — 86), —-x<r<w,—-x <O< w; 





LOCATION AND SCALE PARAMETERS 


the second deals with densities of the form 
fez) = _ fle = 4) 


b ? —-~x <60<2<b, 
[ te-a 
6 


where 6 is any fixed number. 

Let U(X, , X2,---, Xn) denote any statistic with the translation property 
(see (2.2)). When X has density f(z — 6), U is known to have a density with 
form g(x — @), —«© <a < @. Thus, for an arbitrary function §(@) we consider 
the estimation equation 


(II) a9 o(x)g(x — 6) dx = £(6). 


As in the case of equation (1) Section 4, this equation is essentially of convolu- 
tion type. The following simple theorem may be stated. 

THEOREM 5: Let X have density f(x — 6), U(X,, Xe, +--+: , Xn) be a statistic 
with density g(x — 6), and &(@) be a function with a bilateral Laplace transform. 
If there exists an unbiased estimator ¢( U) with a bilateral Laplace transform, it will 
be determined uniquely by 


1) — mt | Ble(—z); 8), | 
o(U) =% | 2s . Ul. 


Proor: Replace x by —z and @ by — 6 in equation (11); then take the bilateral 
Laplace transform” of each side of the equation. 


Corouuary: If X has density f(x — 6), and U(X,, X2, +--+ , Xn) is @ statistic 
with density g(x — @), then the function 


&(0) = P(X = z| 8), 


z fixed, has the unbiased estimator 
; - | Blf(z); s] . 
*(z| UL = | Miedid,, Ul, 
ie ia eBig(z); 4 
provided the right member exists, and hence P(X ¢ A | @) has the unbiased estimator 
o4(U) = fu dp*(z| U). 
PROOF: 


Ble(—z); 8] = B If “ sly) ay; | = 1 Bls(z); 4. 


Thus, from Theorem 5 we have 


\_ gale Bif(z); s] . 
o-2) = 0 ard * | 


- 9 [Bea 


(6.1) 
sB[g(x) ; 8] foam (2). 


11 Recall that for the case of a density g(x) which vanishes for z < 0, B[g(z); 8] can be 
replaced by &[g(z); s] 





354 R. F. TATE 


The latter expression can be rewritten in the form 


_») — m2 | BU(z) 581 . 
¢(—-r) =% [Seed se + el, 


which is equivalent to the conclusion of the corollary. 

If @*(z | U) is itself an integral of a function ¢(z| U), then the latter will be 
an unbiased estimator for (6) = f(z — @), and can be found by omitting the 
s in the denominator of the right member before inverting. 

An extension to the vector case, analogous to those of the corollaries of sections 
4 and 5, also holds for the present corollary. 

EXAMPLE 3: The corollary allows another derivation for a result of Kolmo- 
gorov [11] (see also [16]). Let X be normal with mean @ and known variance 
og ; let U(X, , Xz, --- , Xn) = X. Then, 





Blf(x); s] in erot Big (x); s] = errtian 


The estimator 


o(z | X) i Berar tina, “~~ xX] 


’ 


6.2) 
1 


“rts ae 


(2-8) 2/205 (1—1/n) 


is minimum variance unbiased for 





er —(2-6)2 /205 
(6.3) £(6) = f(z — 6) = aN : 
It can easily be shown that for 1 S m < n 
1 m m ‘ . 
= 2 — 0)-+-fltmn — 6) = | —>=) 2: - Zz - ‘209 
(6.4) fle — 0)f(z. — 6)-- ff 6) (<a) exp | 2, | i; — 6)°/ “| 
is estimated by 
S 1 
O( 21,22, °°* ,2m| X) = ———— a. 
(oor 2r)” —-— 
(6.5) m n 


exp | - > (z; — 2)*/203 | exp | -« — X)*/2e5 € = ‘) |. 
j=l m n 


In his derivation of the above result Kolmogorov [11] utilized the close con- 
nection between the “‘source solution” of the heat equation 


—22/4t 


1 
(6.6) v(z,t) = Tae s O<t<nu,—-xe Cz<+4+~na, 


and both the kernel of the equation and the function to be estimated. Hirshman 
and Widder [9] considered the heat equation in a similar, but not identical 





LOCATION AND SCALE PARAMETERS 355 


manner in order to solve convolution equations. They defined the Weierstrass 
transform of a function ¢(2) as 


+a 


(6.7) o(x)y(@ — x,1) dx = £(0). 


Inversion is accomplished by the operator exp (—tD’*), defined by 


—tp2 cs 1 _ Ww yt/at yD 
e &2z) = ek fre e’'e(x) dy. 


Then, ¢'”’¢(x) is shown to satisfy the heat equation, and 


(6.8) o(x) = Lime ”’é(z). 
tt—1 


Hirshman and Widder, in a series of papers summarized in [9], consider in 
general the convolution equation 


+e 

(6.9) ss o(x) $(6 — x) dx = £(8), 
where ¢(z) is a function with a bilateral Laplace transform 1/E(s), E(s) having 
the form 

Lx 
(6.10) E(s) = ¢* TJ] (: _ -.) c. 

k=l ay 
withO<c< «,-«% <a<~,-% <b< w, D(a)” < o. 
Motivated by the methods of operational calculus, they assign a meaning to the 
operator E(D), and then write the solution of their basic equation as ¢(z) = 
E(D)é(x). Their work constitutes a rather elegant unification of many separate 
integral transform theories under the general heading of the convolution trans- 
form, but it is essentially non-statistical. This may perhaps best be seen from 
the fact that one of the cornerstones of their theory is the notion of a variation 
diminishing kernel; that is, a kernel ¢ such that the number of sign changes for 
é(x) in (—*, +2) does not exceed the number of sign changes for ¢(x) in 
(—«, +) when @¢, &, and ¢ are related as above. This concept has no special 
operational meaning in statistical problems, which accounts for the fact that 
Kolmogorov, as well as Washio, Morimoto, and Ikeda, who were interested in 
problems of statistical inference, were apparently unaware of some of their work. 
The function of the Weierstrass transform in the Hirshman-Widder theory is to 
take care of the factor exp (—cs?) in E(s). 

EXAMPLE 4: The normal distribution is a special member of the class of stable 
distributions. Let us recall three facts concerning this class (see Gnedenko and 
Kolmogorov [7], chapter 7). First, the characteristic function must have the form 
(Khinchin and Lévy) 


C(t) = exp E — Bit|* (: + iy sgn (t) tan =)|. a <6< o,F290, 


0O<a 2,0 1,y| 31, 





356 R. F. TATE 


and another form (which will not be used) for a = 1. Also, all nondegenerate stable 
distributions are continuous (Khinchin), and for 1 < a S 2 have entire distribution 
functions (Lapin). Thus, a nondegenerate stable distribution has a density f 
which satisfies our conditions. Let a, 8B, y be known, with 1 < a & 
2, U(X, X2,---,X,) = X, and £(0) = f(z — 0). Theorem 5 and its corollary 
can also be stated in terms of characteristic functions (which is easier in this 


case than to convert C(t) to a bilateral Laplace transform). Using this fact we 
obtain 


o(z| X) 
(6.11) c . 
= cC | exp — B(1 —n*™) |¢* (1 + iy sgn /t) tan =) , ;z- x], 


Therefore, 


." l 1 - xX 
12 ; - | fe ap (__ é 
(6.12) oa(X) AC - =) f ( ~ =n) 4 


is an unbiased estimator for P(X ¢ A | @). 





For a density of type f(z — @), and any statistic U with the translation 
property, it is well known (and also follows from our discussion earlier), that if 
E,(U) exists for any value of 6, it will exist for all 6, and that ¢(U) = U — 


Eo( U) is unbiased for (6) = 6. It appears reasonable that if the first moment 
of a density f(a — 6) fails to exist, there will be no unbiased estimator for 6 
based on a single observation X, . The strongest result known to the author in 
this direction is: for a density f(x — 6) = 0 when x < 6 there is no estimator 
¢ which is bounded in every interval (0, c) and such that the order condition 
o(x + y) = O(¢(x) + $(y)) holds for large x and y. The proof of this some- 
what unnatural assertion will be omitted. 

The existence of an unbiased estimator may well depend on the sample size. 
In fact, the following observation, stated as a Theorem, will be of some interest 
in this connection. The proof is simple and will be omitted. 

Tueorem 6: If X has the density f(x) satisfying the conditions f(z) = 0 for 


x < 0, and 
1 
f(z) = 0( saan) 


re ° vk . . 
for large x, where k and n are positive integers, and a > 0, then EX¢ will exist 
when Xj is based on n observations. 

From the unparametrized density of Xz, 


p(x) = nf(x){l — F(x)\"", 0O<2r< x, 


where F(x) is the distribution function of X, an integration by parts shows that 


(6.13) E,\(Xs) = xp (x) dx = / {1 — F(2)]” dx 
0 0 


whenever the right member exists. 





LOCATION AND SCALE PARAMETERS 357 


ExaMPLe 5: Let X have the half-Cauchy density f(x) = (2/r)(1 + 2°)”, 
0 <x < &, with distribution function F(x) = (2/=) tan’ 2, x > 0. If we are 
interested in E( Xs), then k = 1, and by Theorem 6 the smallest sample size 
which will insure its existence isn = 2. In that event EX% will not exist; in 
general for n = k, the first k — 1 moments of Xx will exist. It is clear that 


oe 9 n 
E,)(Xs) = i ¢ — = tan” 2) dx . 
0 T 


From the table of Grébner and Hofreiter ({8], page 156) we see that 
A 9 s n o (—1)’Bax” ) 
(6.14) Xs) = Xs — 2 - — 
HAs) = 4a (+ nd Gath — DG! 
is an unbiased estimator of @ in the density f(a — @). 
EXAMPLE 6: Let X have the density f(z) = exp{|—(e* —x—-1)},0 <2 < &, 
with distribution function F(z) = 1 — exp (1 — &*). 


E,( Xs) - | er - a. 


0 
so forn = 1 
(6.15) o(Xs) = Xs + e"Ei(—n) 


is an unbiased estimator for @ in the density f(z — @). 
We shall now consider truncated versions of the densities f(z — @): namely, 


f(z) = fe = 9) 


b ‘ —07 <0@<2z<b. 
[ sa-oa 
6 


Results are not analogous to those of the untruncated case, as the following 
theorem shows. All estimators occurring in the sequel will carry the subscript b. 
THEOREM 7: /f X has the density fe(x), @ < x < b, introduced above, then there 
exists no unbiased estimator for 6 which has the translation property. 
Proor: Let ¢5(X1, X2, --- , Xn) be such an estimator. We can write 


go(X1, Xo, °->, Xn) = Xi + (0, Xo — Xi, --- , Xn — Xi). 
Consequently, 
06 = Eety(X1, X2, °°, Xn) = Eo X1) + 
for some ¢ and all @ < b, since the joint distribution of 
(X2. — X,, X3 — X1, °°: , Xn — Xj) 


is independent of 6, and the expectation of the bounded random variable X, 
must exist. Thus, ¢ is unbiased if and only if X; + c is unbiased. This quickly 
leads to 


b—#8 
/ (x —c) f(z) dx = 0 —2o <#<b, 
0 





58 R. F. TATE 


which implies that f(x) = O for —*« < x < b, contradicting the fact that f(z) 
is a density. 

It is difficult to construct a method leading directly to unbiased estimators 
based on the whole sample for the truncated case. There is a way to construct 
an estimator for @ based on a single observation X,. An average could then of 
course be obtained from a sample X,, X:, --- , X, . Since the construction is 
based on an integral equation of somewhat more interest than equations (I) 
and (II), in sections 4 and 5 respectively, it will be presented as 

TuHeoreM 8: If X has the density 

is 6<2<b, 


6 


then, if there exists an unbiased estimator ¢y(X1) of 6 which has a Laplace transform, 
it will be determined uniquely by the relation” 


a af -l 1 Lf (x) ; ’ s) r | 
o( X1) = b ~ 4 - st{f(a) ; stif(z) js)’ ;b— X; 


Proor: The estimation equation can be written in the form 
b 6 
/ [b —_ oy( 21) f(xy — 6) dx = (b —- 6) / fit — 6) dt ° 
6 6 
Now, let 
b — @(b — x1) = w(x), b— 21 = y, b—O=r. 
This produces the equation 


(IIT) [ (y)f(r — y) dy = a fly) dy, O<r<o, 
0 0 


The transform {[y(x); s] exists by hypothesis, and &{f(2); 8] exists because 
f(x) is a density over (0, ~ ). Hence, 


(6.16) Liyo(x); s] LUf(z); s] = L |= f fly) dy; |, 


and by virtue of the relation 


: a z d g (x os 
(6.17) e|z fly) dy; | =— I fly) ay; 8| ee (7 ot “), 


we obtain 


eins);a1 = 4 - SCA, 


This proves the theorem. 


12 Qf (xz); s] = (d/ds)2[f(z); s]. 





LOCATION AND SCALE PARAMETERS 


EXAMPLE 7: Consider the density 


1 e71/2(2—6) 
fila) = Van — i r=) 6<2<b. 
oF (s 





b 
3/2 -_ 
ss 0 V2e (t — 0)*” dt 


From the Bateman Project Tables ((1], p. 246) we obtain &[f(z); s] = 
exp (—+/2s). Also, @[f(x); 8] = —(2s)”” exp (—~+/2s). Therefore, 


® 


1 1 Sit soe 
(6.18) (X,)=b-—-& | 3 + 3? J 30 — x, | = X,-— ees 


is an unbiased estimator for @. We can also state that X — (2/r)“*n™ }°(b — 
X,)'” is unbiased for @. Note that the density obtained from f(x) by putting 


b = «, has no first moment, and it is conjectured no unbiased estimator based 
on a single observation. 


7. Applications to Life-Length Distributions. The purpose of this section is 
to derive minimum variance unbiased estimators for functions of the parameters 
of the important distributions in life-testing, namely the exponential, gamma, 
and Weibull distributions. The exponential distribution has been found to give a 
good fit to length-of-life data in many situations not involving fatigue: for ex- 
ample, lengths of telephone conversations, and lengths of life for electron tubes. 
The two main distributions which describe life-length under fatigue are the 
gamma and the Weibull distributions. Although the exponential distribution is a 
special case of both of the latter, it is not suitable since its use carries the impli- 
cation that at any time future life-length is independent of past history. This 
appears untenable per se and has also been virtually disproved empirically by 
Freudenthal and Gumbel ((5], p. 579) in their work on the fatigue of metals. 
The model of Birnbaum and Saunders [2], which helps to explain the roles played 
by the gamma and Weibull distributions in fatigue testing, will be discussed 
briefly. 


EXPONENTIAL DISTRIBUTION. Let X have the density 
7.4) fe(z) = ao. 2s > €. 


It is well known that (Xz , 7. X,) isa (vector) sufficient statistic for the (vector) 
parameter (6, p). It will be convenient in what follows to consider instead 
(X;, X — Xs) which of course is also sufficient for (0, p). This statistic was 
extensively discussed by Epstein and Sobel [4]. They derived minimum variance 
unbiased estimators for @ and p after verifying completeness for (Xs, X — Xs). 
For our purposes the joint density of Xs, and X — Xs is required; it is implicitly 
contained in the work of Epstein and Sobel, but was not written down. In view 
of the facts that Xs and V = }-(X, — Xs) are independent, and that V is 
distributionally equivalent to the sum of n — 1 independent exponential random 
variables when @ = 0 and p = 1, it is quickly shown that Xs and Y = X — X, = 





360 R. F. TATE 
V/n have joint density 
(72) hale, y) = MO yet, 9 cay m0 Sy < 


All estimators given below will be functions of Xs and Y, and will have minimum 
variance. 
For a general function £(6, ») the estimation equation is 


] o ‘ ' —np 
(IV) i [ ¢(zs, y)y" eS dy drs = PS aye ee pre 
@ (np) 


If €(@, p) has one partial derivative with respect to @, and if there exists an 
unbiased estimator ¢(X,;, Y) which for almost all Xs is continuous and has a 
Laplace transform in Y, this estimator will be 


d& \ 
(n — 2)! eg £(0, 8/n) _ 30 (6, s/n) -y 


7n—2 a sar 5 . 
y” 2 s” 1 8” lm Xs 


(7.4) o( Xs, Y) - 





This formula is easily obtained from (IV) after differentiating both sides with 
respect to @ and replacing np by s. 
EXAMPLE 8: £(0, p) = 6,r 2 0. 





a. ay ’ rt 
(Xs, ¥) = % 2)! | si r| | 
nes §=—Xg 
By virtue of &"(s"’) = ¥"'/T(v), v > 0, we have 


(Xs, ¥) = X3- — : 


‘ YX: igh 
= 


For (6, p) = 0,¢(Xs, Y) = Xs — Y/n — 1 (Epstein and Sobel [4], Corollary 
8). 
EXAMPLE 9: £(0,p) = p,r<n-—l1. 


. ' r+i—n . “it 
¢(X5,Y) = (9 Shh gs ° if | = Nf aubs }) 
y= 1 } 


uv 'n'T(n — r — 1) 
(See Epstein and Sobel [4], Corollary 6, for the case r = —1) 
EXAMPLE 10: (100 pth percentile) §(6, p) = (1/p) In (1/1 — p) + 8. 


(Xs, ¥) = Xs——- ¥(1 + nin (1 = p)), 


EXAMPLE 11: £(6, p) = [P(X S z| 0, p)]", m = 1, 2,--- 
We have 
t(0,p) = {1 — ey", 0 < z, 


and zero elsewhere. 


ms —a(z—0)/nym—1 —s(z—8)/n 
- {1 = = ] e ; 


OF tate 
39 (8 8 n) = 





LOCATION AND SCALE PARAMETERS 


Thus, 


(7 2)! o-1} 4 ’ > —a(2—6)/ 1 
¢(0,y) = — gt] | : . =“ | 3 


(z — 6) 
eae See. 
n 


y 
(7.5) ¢(6,y) = 


2 ) [y — (2 — 6)/n)"* pees 
eo 


n 


(n — 2)! — 
Now, a formula of the Bateman Project Tables ({1], p. 244) states that 


{za~"] 
Q[a(s)(1 — ee)"; 2) = DO Ce(—1)*2[8(s); 2 — ak]. 


k=0 


An application of this formula, together with (7.5), to the expression for (6, y) 
vields 


(Xs, Y) 


(7.6) oi > | or +4 ( a “) ort | ( ~1) (1 _k eae) 
l 


with the summation extending over k = 1, 2, ---, Min [m, n¥(z — Xg)7". 
Note that £(0, p) isalso P(X, S z | 6, p) if X, is the maximum of m observations. 


For m = 1, £(6, p) = P(X S z| 98, p), 
(0, z< Xs, 

- oo , > ) l 5 = As ag » 

(7.7) (Xs, Y) =41l—{1—- eons ‘ Y > (z — Xzg)/n, 
n ny 


i, Y < (z — Xs)/n. 


The estimator for P(X ¢ A | 6, p) is then 


(Xs, Y) 


(7.8) ' ( 1 9) »-~x\" 
= ~14(Xs) + Mie Nl | (1 -+— ‘) dz. 
n n?y A(Xs,Xg+nY) ny 


Now consider the truncated exponential as an example of a Type I density 
(see Section 5) with 


fo(x) = ky(@)hi(x), @<2.< D, 


k,(@) = gpa h(x) =e”. 


EXAMPLE 12: &(6) 


we rr > a ee 
g (Xs) = rXs y fxg(Xs) = (e s_ 3 3 e Xs 


, 





362 R. F. TATE 


and by Theorem 3 


(Xs) = Xs — 2 exS — *®), 


Remark 2, following Theorem 3, applies and we have the estimator 


els) @ 35 ~ — X35. 


n 
EXAMPLE 13: §(@) = P(X Sz] @). 

(0, 2< Xs, <b, 
(7.9) o*(z| Xs) = ‘y ae ee e * a) 2 

in ¢ “t *) 1 — es)’ 2 < 4s <s, 
and 
" ‘ 1 , I e- 
(7.10) oa(Xs) = — Ia(Xs) + (1 - ) | a ae 

nm n A(Xg,b) © y= @ 


GAMMA DISTRIBUTION. The model of Birnbaum and Saunders [2] provides a 
framework for the discussion of results in this and the next subsection. They 
consider a structure consisting of m components which is subject to stress of 
some sort. Let S, be the length of life for the structure until \ < m components 
have failed. It is shown that S, has the density 


\ 


zx lA—1 ( z 
(7.11) fiz) = a (/ va(t) at) va(x) exp< -| ys(t) dt>. 
TA) \Yo . 4% / 


They term y;(t) the failure rate of a single component at time ¢ under the damage 

function 6; it is assumed that 

(7.12) ya(t) = w(t)d(t), 

where w’(t) represents the deterioration of a component with time, and 6(t) 

represents the instantaneous damage at time ¢. The other assumption on which 

the above result depends is that the instantaneous damage to the remaining 

m — j components after j(< ) have failed is inversely proportional to m — j. 
The gamma distribution with scale parameter p and known parameter \ has 

the density 


f(x) = FO) (ax) "e™, 0O<r<, O<p<x 


, 


and is the distribution of the life-length of a structure which survives until \ 
components have failed, and is subject to constant instantaneous damage with 
no deterioration; that is, y;(¢) = p. The statistic 7 X;, is sufficient and complete 
for p; hence, all estimators in this subsection (except (7.19) ) have the property 
of minimum variance. The density of > a X, is known to be pg‘{pxr), where 
*/T(nd), 0 < x < &. Thus, by Theorem 1, Section 4 


(7.13) tS. x) = (5 ) 


nrk—1 — 
e 


g(x) = z& 


(nd — r) 


LX 





LOCATION AND SCALE PARAMETERS 363 


is & minimum variance unbiased estimator for §(p) = p', r < nd. This result was 
also obtained for 0 < r < nd by Washio et al [16]. 
Now consider 
‘i * m A—1 m 
ae ©) = oye) en (-2 2), 
the joint density of m gamma distributed random variables, with known pa- 
rameter \ and unknown scale parameter p, evaluated at a fixed sample point 
(21, 22, °°, 2m). Recall two properties of Mellin transforms: 
(i) Milx7s(x); s] = MlE(x); s + I, 
(ii) Mig(ax); 8] = (1/a") Mie(x); s]. 
The corollary to Theorem 2 is to be used; accordingly, we employ properties 
(i) and (ii) to calculate 


; (I=) r(md + s — 1) 
mM |? &(x); | = Ah ’ 
(r(a)]” (2 “) 


j=l 


Mig(x); s] = HF et 7 


whence, 


m A—1 
) | (I :) (my + s — 1)P (nad) 
oO = jul wa J 


m m\+8—1 » 
[r(a)]” (> “) r(nm + s — 1) 


j=l 


Bux) = 2” "(1 — x)” ’/B(u, v) 
for 0 < x < 1, and 0 elsewhere. It can be shown that 
71k / Mu +s — 1)P(u +») 
‘ 3 AZ); 9 re 5 
(7.15) MlB,.(x); 8] la be de — DIG) 


consequently, from property (ii) 


m »A—1 
1 \ (mv) (I :) ms 
- t) (1) = + =h-i Bar, (n—myd (« » ::) ’ 
+e (r(a)]” (3: :) F 
j=l 
and finally for the estimator we have 
.. (nad) 
(21, 22, °° * , Sm! xX.) = rrr oe 
a | 2X) = Trou (> X.)™ 


m (n—m)h—1 
h—1 
F DL 
: IT z j=l 
j=l 1— 


DX: 





364 R. F. TATE 


for 0 < >-742; < >> X,, and O elsewhere. The special case m = 1 is easily 
handled. The estimators for P(X S z|p) and P(X €A | p) are, respectively, 





/ rz =X, 
; Br, (n—ya(t) dt o<s < X 
(7.17) ¢*(z| >> X:) = | thre 2% 
l, 2 > > Xi. , 
and 
z r»A-1 : (n—1)A—1 
1 (5x) (1 - > z) dz 
(7.18) (>> Xi) - | — lt Oe det 
A(O,>X,) ¥. Xi BA, (n pe 1)A) 


Numerical calculations can be carried out with the Table of the Incomplete 
Beta Function [13]. 

EXAMPLE 14: Consider the following sample of life-lengths in hours for a 
structure with 3 components: 


592 198 458 780 132 
1012 884 530 582 606 


Suppose we wish to estimate P(X < 100| p), the probability that the life of a 
given structure will not exceed 100 hours. 


n = 10, A = 3, (n—-1)A\ =: 


bo 
~I 


> X. = 5774. 
The estimate is then (see (7.17) ) 
$*(100 | 5774) = 1 — I 9627(27, 3) = .015. 


One might also want to know one of the percentiles, say the 99th. We then esti- 
mate b 99/p, where b 9) is the 99th percentile of the unparametrized distribution, 
that is, for \ = 3, p = 1. The above formula (7.13) gives the estimate 


ostee l —_ ‘ a 
$(5774) = ——~ (5774)(840) = 1680 hours. 
10(3) 
Let us consider for a moment the case of the gamma density f(x ~— @) = 
(c — 6)**e*/T(\) with translation parameter. The following unbiased 


estimators, which do not have the minimum variance property, have been calcu- 
lated from the formula (6.13) for Eo( Xs): 


~ 


 j=0 J Vv 
n 2) 9; .t 
6 IEEE, 
nN j=0 k=0 \J k J nk 


For \ = 1 we have the exponential case, and the estimator is known to be 
Xs — (1/n). Each higher integral value of \ produces an expression with an 
additional summation sign. 


(7.19) 


ll 
vo 





LOCATION AND SCALE PARAMETERS 365 


WEIBULL DISTRIBUTION. This distribution fits the Birnbaum-Saunders model 
for the failure rate y;(t) = ap* ‘t” which may arise in a variety of ways. For 
example, the instantaneous damage may vary as a power of the time, with no 
deterioration in the component or vice-versa; it is also possible for both w(t) 
and 6(t) to vary as powers of the time. There appears to be no way to distinguish 
between these possibilities with their methods. We consider first the Weibull 
density with parameter p and fixed a, 


(7.20) f,(z) = ap(pr)* 'e **", z>0,p>0,a2 1. 


The statistic }> Xf is sufficient for p and has densiy p*g(p*x), with g(x) = 
-“e*/T(n). This follows from the fact that (pX )* has an exponential distribu- 
tion with parameter 1 whenever X has the Weibull distribution with parameter 
p (see the first part of Section 4). The family of distributions for }> Xf is known 
to be complete and hence all estimators (except (7.28) ) for functions of p will 
have minimum variance. 

By Theorem 1, Section 4, 


, ra’ T'(n) ] = 
(7.21) 0. Xr) = —————. ( : r< na, 
x T'(n — r/a) > xry).’ 
is the proper estimator for p’. The joint density of m independently distributed 
observations, evaluated at a fixed point, can be estimated in a manner entirely 
analogous to the gamma case. 


m a-l m 
(7.22) eo) = a" ( i) exp (—¥ (o2:)"), m=1,2,--- 
j=l 


j=l 


m 


a—l 
:) T(s + m — 1)T(n) 


) (is +n — 1) 


’ 


and finally 


m n—m—1 


5 
~ 98 , BY) ee NS i ~ ee 
(7.23) ¢(>. Xr) a Tin — m) (> xX? m l > x? 


for 0 < )-jus27 < >> Xf, and0 otherwise. For the cases §(p) 
and P(X ¢ A | p), respectively, we then have 


~a n—l 
-{1-=~) , 0O 
(724) 9¢*%2! Xf) = fi (1 =x) 


U1, 


(n — 1)at™ ( t= J 
(7.25) (2, Xt) = / ee aa 
) o4(>, Xi ) A(0,(Exg)1/@ 2, x6 >, x: 





366 R. F. TATE 


The 100 pth percentile of the Weibull distribution is 


‘ 1 1 l/a 
(7.26) E(p) = — [in (-+-)] : 
p ~~? 


for which the estimator is 
l/a 
E (-.)| I'(n) 
(7.27 X?) = ~ he xeyue. 
(7.27) o(> Xz) Fe + 178) (> Xx?) 


For the location parameter case formula (6.13) of Section 6, previously used 


for the gamma distribution (A = 2, 3), provides the estimator for 86, 
7 99) ; 7 r(1/ 
(7.28) (Xs) <<, ae 

ania 


which does not possess the minimum variance property. 
REFERENCES 


{1} BareMAN Manuscript ProJect, California Institute of Technology, Tables of Integral 
Transforms, Vol. 1, A. Erdélyi, Editor, McGraw-Hill Book Co., New York, 1954. 
[2) Z. W. Brrnpaum anv 8S. C. SaunpErs, “‘A statistical model for life-length of materials’’ 
J. Amer. Stat. Assn., Vol. 53 (1958), pp. 151-160. 
[3] R. C. Davis, “On minimum variance in nonregular estimation’’, Ann. Math. Stat., 
Vol. 22 (1951), pp. 43-57. 
[4] B. EpsTe1n ann M. Sose., “‘Some theorems relevant to life testing from an exponential 
distribution’, Ann. Math. Stat., Vol. 25 (1954), pp. 373-381. 
[5] A. M. FrREuDENTHAL AND E. J. GuMBEL, ‘“‘Minimum life in fatigue’’, J. Amer. Stat. 
Assn., Vol. 49 (1954), pp. 575-597. 
(6) M. A. Grrsuick anp L. J. SavaGce, “Bayes and Minimax estimates for quadratic loss 
functions’’, Proceedings of the Second Berkeley Symposium University of California 
Press, 1951, pp. 53-73. 
(7) B. V. GNepENKOo anp A. N. Kotmocorov, Limit Distributions for Sums of Independent 
Random Variables, Addison-Wesley Publishing Co., Cambridge, Mass., 1954. 
[8] WoLtrGanG GROBNER AND NIKOLAUS Horre!TeER, [ntergraltafel, zweiter teil Bestimmte 
Integrale, Springer-Verlag, Wien und Innsbruck 1950. 
{9} I. I. HinscuMaANn AnD D. V. Wipper, The Convolution Transform, Princeton University 
Press, Princeton, New Jersey, 1955. 
{10} T. Krracawa, ‘‘The operational calculus and the estimation of functions of parameters 
admitting sufficient statistics’, Bull. Math. Stat., Vol. 6 (1956), pp. 95-108. 
[11] A. N. Kotmogorov, ‘‘Unbiased estimates’’, (Izvestiya Akad. Nauk SSSR, Seriya 
Matematiteskaya, Vol. 14 (1950) pp. 303-326). Amer. Math. Soc. Translation No. 98. 
[12] B. O. Koopman, ‘‘On distributions admitting sufficient statistics’, Trans. Amer. Math. 
Soc., Vol. 39 (1936), pp. 399-409. 
[13] Kart Pearson, Tables of the Incomplete Beta Function, Cambridge University Press, 
Cambridge, England, 1934. 
[14] E. J. G. Pitman, ‘‘Sufficient statistics and intrinsic accuracy’’, Proc. Camb. Philos. Soc., 
Vol. 32 (1936), pp. 567-579. 
[15] E. J. G. Prrman, ‘‘The estimation of the location and scale parameters of a continuous 
population of any given form’’, Biometrika, Vol. 30 (1939), pp. 391-421. 
[16] Y. Wasnro, H. Morimoto, anv N. Ixepa, ‘‘Unbiased estimation based on sufficient 
statistics’, Bull. Math. Stat., Vol. 6 (1956), pp. 69-94. 
[17] D. V. Wipprr, The Laplace Transform, Princeton University Press, Princeton, New 
Jersey, 1946. 





AN EXTENSION OF THE CRAMER-RAO INEQUALITY’ 


By Joun J. Garr? 


Virginia Polytechnic Institute 


1. Review of the literature and summary. Cramér ((6], p. 474 ff.), Darmois 
[8], Fréchet [10], and Rao [14] derived independently a lower bound for the 
mean square error of an estimate ¢ of a parameter which appears in a frequency 
function of a specified form. This epxression, alternately termed the Cramér- 
Rao inequality or the information limit, is 


[sno 
ois) 2 . ia Oa 

(1.1) E(t a) [E(t) al + Tan e\' ? inl! 
E = 

Oa 


where ¢ is the likelihood of the sample. The expression E(d In ¢/da)’ is called 
the information on @ and is sometimes denoted by /(a). Under rather general 
conditions it can be shown equal to E( — 4” In¢/da’). 

The equality in (1.1) is reached if and only if, 

2) ou georm@ 
where ¢ and ¢; are functions of the observations alone and V(a) and W(a) are 
functions of a alone. By the results of Pitman [13] and Koopman [12], the form 
of (1.2) implies that ¢ must be a sufficient statistic. The fact that this form of the 
likelihood yields a minimum variance estimate was first pointed out by Aitken 
and Silverstone [1]. If we have n observations which are independently and identi- 
cally distributed, the frequency function of the underlying population must be 
of the so-called Pitman-Koopman form, 


(1.3) f(a; a) = ul(a)h(x)r"™ 


and ¢ must be a function of >>? g(2;) for the equality in (1.2) to hold. 

Several extensions of the basic inequality have been derived. Bhattacharyya 
[4] and Chapman and Robbins [5] have derived results which yield more stringent 
inequalities in certain instances. Wolfowitz [21] has extended the result to 
sequential sampling situations. Cramér [7], Darmois [8], and Barankin [2] have 
considered joint bounds on sets of estimates of parameters and Hammersley 
[11] has derived a:‘lower bound of the mean square error of an estimate for the 
situation in which the parameter to be estimated can only assume discrete 
values. Barankin [3] has also considered lower bounds on the general absolute 
central moments of the estimate. 

All these results assume that the parameters involved are constants. Here we 


Received May 21, 1957; revised May 25, 1958. 
1 Research sponsored by the National Science Foundation under grant NSF G-1858 
2 Present address: Department of Biostatistics, The Johns Hopkins University. 


367 





368 JOHN J. GART 


shall consider the case where the parameters are random variables. Thus the 
lower bound of the mean square error of an estimate will take into account the 
variability due to both the observations and the parameters involved. Necessary 
and sufficient conditions for equality of the extended inequality are derived. 
Most unfavorable distributions, i.e., distributions which maximize the lower 
bound, are defined, and several examples are given. Extensions analogous to 
those of Bhattacharyya [4] and Wolfowitz [21] are also considered. Finally, 
bounds on the variance of linear estimates of the mean of the parameter are 
derived. 


2. Notation. Consider a frequency function f(z |), where 6 = (6;, 42), 
-++ , 6,), the function being specified when 6 is specified. Further, 6 is a random 
variable having the distribution G(@) defined over a non-degenerate range A, . 
Let X = (21, %2,-°**,2%n) be a random sample from a randomly chosen popu- 
lation having the specified frequency function. Let & = &(X) be an estimate of 
6.,1 2 k S s, functionally independent of 6. Denote E(t | 6) by ¥x(6) and 
the conditional likelihood of the sample by ¢(X | 6), which in general will be 
[[i f(z: | 6). 


3. The continuous case.’ If f(x | 6) is a density, assume 09/06, exists for all 
6 in A, and | 06/00, | < H(X) where H and ¢H are integrable over R, , the 
range of X, which is independent of 6 . We have 


(3.1) 1 - | @ dX 
Rn 

and 

(3.2) v.(8) - | te @ dX. 


n 


By the assumptions just made (see Cramér [6], p. 66 and p. 475), we may differ- 
entiate under the integral signs in (3.1) and (3.2) and obtain 





; Ip dln@ 

3.3) 0 = / =a | IX 

ya Ra 06, : Rn 00, ? ; 
and 

OY, (8) / og / alng 

3.4) ———— , — dX = _ 1X. 
\ a0, a 1 oe 
linding expectations of (3.3) and (3.4) with respect to 6, we have 

0 — 
(3.5) 0 = | | ) Ind 5 ax aG(e) 
A, “Rn 00; 

and 





* 


en _{ (9) / Oy,.(8) 5, i i dine : 
(3.6) E\ — ={ ———dG(@) = — ). 
: ( a8, ) 39, 1008) = ff te ag 6 dK dG 


‘Results analogous to those of this and the two succeeding paragraphs have been ob- 
tained by Schiitzenberger [15] for the a posteriori distribution of 6 . 





CRAMER-RAO INEQUALITY 


By the Schwarz inequality we may write 


( \ 
{| [ [te — % — Elvx(@)] + E(O)]' dX dG(o) > 


{ff [ CBs dX acca) 


( 
= J. [& — O% — Ely(0)) + E(6)) (22 *) 6 ax aao)}. 


In view of (3.3), (3.5), and (3.6); (3.7) may be written, * 


2) 
Var (t, — 6.)EE (2 In *) 0| > EF (0), 
06, | 7 00, 


and if EE{(d In ¢/d6,)° | 6] + 0, then 


ac 
Var (&§ — 4%) = — oF, 


TC) 


where J;,(6) = E[(d In ¢/d%)* | 6]. Since Var (t, — 0) = EE[(t, — ®&)*| 0] — 
E*[y.(@) — 6], we may write 


‘ ) 
2 Q) —_ . = . | " ~ BIT(@)) 
(3.8) EE (t — 6)*| 6] 2 E'lva(0) — 6) + ree - 


If ¥.(®) = & , (3.8) may be written 

—_ L - 

E{T.(0)| ° 

When [0°¢/d6;] < K(X), where K(X) is integrable over R, , it is well known 


that 
» din ¢\ - « a’ In 6| 
| (22) \°| =| -“aertle] 


Then we can write J,(@) = E{[— a° In ¢/d6; | 6] in (3.8) and (3.9). Since [,(@) 
is called the amount of information, E[J,(@)] may logically be termed the mean 
amount of information. 

It should be noted that the derivation of these inequalities is equally applicable 
to samples from multivariate populations. 


(3.9) EE|(t — &)*| 6] = 


4. The discrete case. Suppose that f(x | 6) is a discrete frequency function 
whose range, R, , may be finite or denumerably infinite but independent of 6, . 
Assume ¢ is a — function of 6 for all X in FR, and 6 in A,, and that 


a > -, 06/00, and a ie -++ So. t(0b/90.) converge uni- 


‘EE siiaialiens taking the expectation with respect to X for fixed ® and then with 
respect to 6. 





370 JOHN J. GART 


formly in A, . By operations similar to those employed in paragraph 3, we find 


(4.1) SS ii PES ue 


21 = In 00, 
and 
9 y,, ( 
(4.2) seer i Set, . SO 
2} z2 Zn 00, 06, 


since the assumptions just made allow differentiation under the summation 
signs. By the Schwarz inequality we may write 


ie > >d--- Sle — & — Ely (0)] + E(%)]’o dG(6) > 
\"Ase 71 =2 in ) 
( din o\’ - 
{ff rE > (AB) ¢ dG(o)) 


( ] ea ae — 
‘| <>Y--- Ee — & — Elyn(o)) + £(o)| = © 6 dG(0)>. 
(YA, 21 Zn C ) 


Ze 


4 


IIV 


Following steps analogous to those in paragraph 3, we arrive at (3.8) and (3.9), 
for the discrete case. 


5. Conditions for Equality. The condition under which the equalities in (3.8) 
and (3.9) hold are set forth in the following three theorems. 
THEOREM |. Jf 
(i) = Pr. {El(t — &)°|@) = a} = 1, 
(ii) Pr. [yi (6) = ce] = 1, 


(iii) Pr. (@ = cs] = 1, 


(iv) rs, | ee) = | = 1, and 


(v) Pr. [7,.(0) = Cs] = i. 





where c;, 7 = 1, 2,--+- , 5, are constants, then the equality in (3.8) holds if and 
only if t, is a sufficient estimate of 4, . 

Proor. Under the conditions of the theorem (3.8) reduces to (1.1), for which 
it has been shown by Rao [14] that the equality holds if and only if ¢, is a suffi- 
cient estimate of @, . 

THEOREM 2. Jf t is an unbiased sufficient estimate of 6, , then the equality in 
(3.9) holds if and only af Pr. [[.(®) = cs| = 1, where cs is a constant. 

Proor. Since ¢ is an unbiased sufficient estimate of 6, we have from Rao 
[14], El(t — &)°| 6] = 7,(@)”. Taking expectations with respect to 6, we 


have 

(5.1) EE|(t — 6)" | 0} = Ell,(@)~"). 
Now equality of (3.9) requires that, 

(5.2) EE\(t, — 6)?|6| = —— 


h{I,(6)] F 





CRAMER-RAO INEQUALITY 


Combining (5.1) and (5.2), we have E[J,(6)~] = 1/E[I,(®)], or 
E{1,(6)“|E(L.(@)| = 1. 


This can be written, 


(5.4) {f 1.(0)~* acco} f T,(@) aaa} = 1. 


Now by the Schwarz inequality we have 


65) {f neoytacco)<f 100) acco} ={f aa(o)} = 
\% Me ) \wAe he 

Obviously when the equality holds in (5.5) it is equivalent to (5.4). But the 

equality in (5.5) is achieved if and only if for a constant cs independent of 86, 

es{1.(@)) * = [1,(0)|' with probability one; that is, if and only if 


Pr [J,.(@) = Cs] = 1, 


which proves the theorem. 

Before proceeding to theorem 3, we cite the following definition. 

Derinition. Any pair of ¢(X | 6) and G(®) wherein any one of the assump- 
tions (i)—(iv) inclusive of Theorem 1 does not hold for the 6 under consideration 
is termed the non-trivial estimation case. 

TueoreM 3. For the non-trivial estimation case the equality in (3.8) is 
achieved if and only if & is an unbiased sufficient estimate of 6, which is nor- 
mally distributed with constant variance equal to J,(6)~*. Consequently the 
equality in (3.9) is achieved under the same conditions. 

Proor. For the non-trivial estimation case the equality in (3.7) and conse- 
quently in (3.8) and (3.9) is achieved if and only if there exists a \ independent 
of X and @ such that, 


d In ¢ & 


a0, i —-%& — Ely,.(@)| + E( 4), 


for almost all X in R, and 6 in A, . Integrating, we have 
Aln@g = Ak — 6;,/2 — Ely(@)) + 6;, E( 6, ) + C,(X, 6*), 


where @* = (6, 02, +--+ , O-1+, O41, °** , 0%). We thus have 

@ = C.(X, 6*) exp (1/X){ Oty - (6;/2) ve 6,.E[y.( @)| + 6. (6. )}. 
This is a special case of the form, found by Pitman [13] and Koopman [12], 
wherein ¢, is a sufficient statistic for 4. Integrating both sides of the above 
equation over R, , we have 


exp {(@/X)|E(O%) — E(x (6) )]} i C.(X, 0*) exp (0 t/X) dX = exp (6;/2X). 
Rn 


Make the change of variables in the integral, 2; = z;(X),7 = 1,2,---,n— 1, 


> >? 


t, = t,(X), where ¢,(X) and the z;(X) are unique, continuous, and possess con- 





32 JOHN J. GART 


tinuous partial derivatives. Further the transformation is one-to-one. Then we 
have, 


exp {(@./d)[E(@.) — EB 





¥.(0)]]} / C;(Z, t,, 0*) exp (0 t/d) dZ dt, = exp (6;/2), 
Bn 


where B,, is the range of (Z, t) = (21, 2,°:*:, 2n-1, &&). If we integrate out 
Z, we have 


2 


(5.6) Cu(ty, Oe dt, = Fe 


be 


, 


where }, , the range of t, , may be taken from — ~ to ~. Then the left hand side 
of (5.6) is a bilateral Laplace transform of C,(t, , 6) with argument @,/d. Recall 
that 6, has a non-degenerate range say 7; < 0 < y2. Obviously e”*” exists at 
6. = ¥, + e and & = yo — e&, where q , & > O, such that e + «& < 72-1. 
Thus we can apply the theorem of Widder ({19], p. 238), and conclude that the 
integral in (5.6) converges for & in the vertical strip of the complex plane, 
1 + a < & < y2 — ©. Thus we can apply the uniqueness theorem of the bi- 
lateral Laplace transform (see Widder [19], p. 243) and conclude that 
Ca(te, ©) = (1/+/2ed) exp (—t/2d). Therefore, for equality, the frequency 
function of ¢. must be 


hai ty) sition 6 he —-x <i < om, 

2m 

where obviously \ = J,(@)~". Further the equality holds regardless of the form 
of the marginal distribution of 6, . 

It should be noted that though theorems 2 and 3 require that J;,(@) be a con- 
stant, it is not necessary that all the components of 6 occurring in J;,(@) be con- 
stants. It is possible, for instance, that the components of 6 occurring in /;(6) 
have a singular multivariate distribution such that all the probability is located 
on the hyperplane /,(6) = constant. 

Obviously the sample mean from a normal population with constant variance 
satisfies theorem 3. However, it is by no means the only such estimate. Let 


2 


f( . ] (In z—6)2/2¢ 


) ae sae € ; 0<t%< a, 


where c is a constant, which is the so-called logarithmico-normal distribution 
(see Cramér [6], p. 220). Then, ¢ = Rote In x;/n is normally distributed with 
mean @ and variance c/n, which is the minimum variance attainable under the 
extended inequality. 


A situation in which a parameter is assumed to be a random variable is the 
analysis of variance model II of Eisenhart [9]. The simplest case is the one way 
classification. Here the model is x;; = a; + €;; , where x;; is the jth observation 
in the 7th class. We assume there are k classes where the 7th class has n; ob- 
servations. Suppose a; and ¢;; are random samples of size m and N, N = 





CRAMER-RAO INEQUALITY 373 


k . . ° . 
> i-1 ; , from two normally distributed populations having means u and zero 
2 ° 2 2 2 ryv 
respectively, and variances 2! and o% respectively. Then, 


>= Il Ul s(x |a;) = a ep{-> > (xy — ai)’ 


i=] j=1 : 2o? ) 


ies 3 In o| nN; : 
BE (- da; Ohet PR 


oe 
The ML estimate for a; is & = OE ILS += 1,2 --+, m. Here 
1/ « \ * \2 2 . oo : i 
E(4é;|a:;) = a;,and EE[(&; — a;)"| a) = o./n; , which by (5.7) is the minimum 
mean square error. Notice the assumption of normality of a; was not required 
for equality. 


6. Most unfavorable distributions.’ In most cases the G(@) is not known, so 
the lower bound on the mean square error cannot be found. If y%.(6) = &, it is 
of interest to know the greatest value the lower bound can attain, as well as 
the set of G(@) which produces it. To this end define Gf(®) to be a most un- 
favorable distribution with respect to 6: if fa, Je(@) dGi(®) < fa, I(0) dG(6), 
for all G(@) defined over A,. 

If J,(@) has a unique minimum with respect to that subset of the parameters 
appearing therein, then a most unfavorable distribution is one for which the 
marginal distribution of these parameters is trivial. It may be that /,(6) is inde- 
pendent of all parameters so that all G(®) are most unfavorable distributions. 
A case in point is the Cauchy distribution, 


’ ; 1 1 
i@ = —_ 
f(z|6) wl + (x — 6)?’ 


[ (x — 6)” . 

Lo (1 + (x — 8)*}* 2° 

Here EE[(t — 6@)*| 6] = 2/n regardless of the form of G(@). There are also cases 
in which no most unfavorable distribution exists except possibly when from some 
prior information A, is restricted. 


where 


I(@) = 


T 


7. Most unfavorable distributions for some Laplacian distributions. M. C. K. 
Tweedie [16], [17] has called a distribution Laplacian if it belongs to the general 
class of distributions for which the sample mean is a sufficient statistic for one 
of its parameters. The general form of such a distribution’s frequency function 
is 

fi x 0; 9 Bs ) = ¢ SEM yg Or h(x, O2). 


This is, of course, a special case of the Pitman-Koopman form (1.3). Here we 
have, 


E|I(0;, 62)) = | 


A 


[m( A; ’ A. )q” ( 6, y+ 05 F’ ( A Ig A 9 be ) d6, a0. ’ 


5 For an analogous concept, least favorable distributions, see Wald ({18], p. 18). 





374 JOHN J. GART 


where E(x | 6; , 62) = m(6,, 02). If J( 6; , 6) has an absolute minimum for some 
subset Az of A., and (6}, 62) is an element of As, then E{[I(@,, 62)] 2 
m( 65 , 62)q” (6) + 6F”(6$), for all (6, , 0) in A. Further, this is the absolute 
minimum attainable by E[/(6,, 62)|. It is reached when dG(@,, 6) = 0, for 
(@, , 4) not an element of As: Thus, we have found a set of most unfavorable 
distributions. This result will now be applied to several specific Laplacian 
distributions. 

Type a. 6 = 1. This includes the binomial, Pascal and Poisson distributions. 

(1) Binomial distribution. 


f(x | 61) i pre. 2 es 0, 1, O< 6; & ty 
6; i 
q(6;) = —lIn ee 9 F(@,) =In(1l— 6), m(@,) = 6, 
a 1 
and 
AD ot Es a, 
0(1 — 0) 
1(6;) has a unique absolute minimum at 6, = 3, so that’ G*(@,) = «(@,; — 4) 


is the only most unfavorable distribution. Further E{/(6,)| « 4N. 
(2) Pascal distribution. 


zin(l— —rin(gl— p— ] 
f(x|o) = 7m"? ane z=rnrtil,-::; 05621, 


q(@:) = —In (1 — 4), F(6,) = n(1> 4), m(0,) = r/0,, 
1 
where r is a known fixed positive integer. 1(@,) = r/@i(1 — 6), which has a 
unique absolute minimum at 6, = 3. Therefore G*(6;) = «(6 — %) is the only 
most unfavorable distribution and E[/J(6,)] < 17N/4. It should be noted «/r 
is not an unbiased estimate of @,. If we consider a = 1/6, as the parameter to 
be estimated, then 


- n(a/a—1)—in(a— x —] 
f(z\a) = ere oes z=rrt+il,---;l<a<o, 


g(a) = In (. = :): F(a) = In (a — 1), m(a) = ra, 
ae 


and 
r 


Here x/r is an unbiased estimate of a. However the expression /(a) does not 
have an absolute minimum in A;, i.e., 1 S a < &, but rather it has a limit 


jOtorz <a 
lforz2a 


* Following Cramér ((6], p. 192), the distribution function e(z — a) = 





CRAMER-RAO INEQUALITY 375 


of zero as a — ©. Thus letting a — © produces a most unfavorable situation, 

which is equivalent to letting 6, = 0. It is interesting to note that though 6, = 3 

was the most unfavorable situation when estimating 6,, when estimating 

a = 1/6, we have, in effect, that 6, = 0 is the most unfavorable situation. Thus 

we have established that “most unfavorableness”’ is not an invariant property. 
(3) Poisson distribution. 


zind,—6, | 
f(x |) = rs — , z= (0,1,2,---; 0<A,< a 


qa.) = — In, F(@:) = 6, m(A) =A, 
I(0,;) = 1/. 


/(6,) has no absolute minimum in A, , but rather has a limit of zero as 6, — ~. 
However, if from some prior consideration we can restrict 6; < a, then a most 
unfavorable distribution is G*(@,) = «(@; — a). 

Type b. 02 # 1, q(@:) = 6, . Immediately we have q”(@,) = 0, and (6, , 62) = 
6.F” (0). 

(1) Gamma distribution. 


f(x | 0, Oe) = @ 8 te2in®s [2% /17(,)], «> 0, 6. > 0, 6; > 0. 
F(6:) = —In@, (0, 02) = 2/65. 


Here 1(6; , 6.) — 0, as 6; — © and/or # — 0, but no most unfavorable dis- 
‘ ' ° 2 

tribution can be cited unless we assume 62/6; S a. 
(2) Normal distribution (parameters adjusted). 


—(z?2/262) 
f(x | 0; , 0) = ete eatetm 
, “ 


V/ 248,’ 
—xox <r<com, 6. > 0, =—o<gg<e, 
where 6 = « , 6 = —p/o’, in the usual notation. 
F(6,) = 61/2 and (0, 6) = &. 


Here 6. — 0 establishes a minimum, so that if we can restrict 6. = a, then 
«(62 — a) is a most unfavorable distribution. 


8. More stringent inequalities. Bhattacharyya [4] has found greater lower 
bounds for the mean square errors of estimates in the case of constant param- 
eters. This admits of direct extension to the present case. We can write (3.3) 
and (3.4) respectively as 


(1 0¢ 
> ) iy i - 
(8.1 I € %. 9) 0 


and 


ay, (@) 1 do ) 
29 = Cov =~ & .— a Fs 
(8.2) 30, Cov (« 6 Sa. 6 





376 JOHN J. GART 


By the result of the appendix, 


—o,,1%\-RIc _ 9, 19%)| 
Cov (« O% , 3 +) = F | Cov (1 A , 3 2 ) | 0] 


0 
+ Cov lz (t, — 0 | @) (19% |0)]. 


From which by using (8.1) and (8.2), we obtain 


a Oy, (8) me - — 1 dp 
(8.3) E (Ne?) = Cov (1 % $ +), 


With suitable regularity conditions on ¢ and its derivatives similar to those 
cited in paragraph 3, we can differentiate (3.2) p times and obtain as in (8.3) 
that 


“B 78: 
>f(Ov(6)\_ A 1 0% a 
(8.4) KE (‘ me) = Cov (1 6: 5 $ ae) B = 1, 2, oP. 
Define, 
10% 1 
ne ; i Mae 
J ap Cov € 365 : ¢ aa) a,p 1, “> » P- 


Let J = [Jas] and J’ = [J**]. Denote by Ro.2...p the multiple correlation co- 
efficient between & — 6 and (1/¢)(d¢/d0.), (1/¢)(0°/d0:), --- , (1/¢) 
(d°p/d0f ). Then by a result cited by Wilks ((20], p. 42 ff.), 
> Ys E (=r) E (" J* 
Ro.193..-p = ar NN . 06% 
Var (t, — 0;) 


‘* 2 ° 
Since Ro.123...p S 1, we may write 


= > E (ZH() E (Z¥00)) j* 


Var (& — 8, — 
ar (4 K) antl ml 06; oe 


IV 


from which we have 


EE|(t, — &)* | 6)] = E°(y(6) — &) 


7 ‘. $ p (Zhi) E (=i?) J”. 


a=l p=1 008 068 


(8.5) 





This is a greater lower bound than that of (3.8) since the multiple correlation 
between t, — 6 and the above series of variates will be larger than the simple 
correlation between t, — 6, and (1/¢)(d¢/06,). This latter correlation is essen- 
tially what was used in deriving (3.8). It should be noted that this method of 
obtaining a higher lower bound applies only if ¥(6) is non-linear function of 
6, and consequently is not applicable in the unbiased case. As noted in Wilks 
({20], p. 46), the equality holds if and only if all the probability in the p + 1 





CRAMER-RAO INEQUALITY 
dimensional space of the random variables lies on the surface, 


one G19 (FW) p08 
ti 0 Ely. (®)] + (6) = 2 245 Bee B(! 008 ) 


9. The Sequential Case. Wolfowitz [21] has extended the Cramér-Rao In- 
equality to situations where the sample size is a random variable depending on 
the sequence of observations. In our notation this result is 


0) 
E(t. — %)’ > [y(0) — a) +- 06 


1 arml (alnf(z|@)\\.] 
E(n| 0)E ( nfle)) | o| 


We shall proceed to extend this result to the case where @ is a random variable. 
Under suitable regularity conditions Wolfowitz has shown, 


,(9ln@ o 
E (eae 0) =(Q@, 
and 


s Oln@ d * . ‘e 210) 
C a : = Ff | y) ae ! : 
(9.1) | ( 30, ) o| E(n | O)E | 80, | 


By definition, 


«© 3 
> | ty (a1, %2,-°-*,2;) [] f(a; | 6) dai = Ye (0). 
j=1*R i=1 


3 


Under the regularity conditions cited by Wolfowitz we may differentiate under 
the integral sign and obtain 


; alng! e Bee dln@ a 
E (1 OO, 0) = Cov (« OK > "os. o) 


The result of the appendix yields, 


” OW,,(0) - 1 Cate ( Be a In@ 
E | a |= co ty O » 5 . 


Since the square of the correlation coefficient of any two variates cannot exceed 


unity, we have, 
2 .(@) dln ¢\ x, 
Ee Wi. (0) < V: _ é .— 6 
( a0, = ar 20, \ ar (ty ») ’ 


E (Sieh) 
Var (tu _ 6.) = OO. 


= ( a = t a a ‘ 2 _-* 
Sie ae see ie) } 
j q | p . . i 
E \ E (n | O)E ( 30, 6 f 





378 JOHN J. GART 


This may be written 


EE((t — 0)" | 0) => E’lyn(6) — &| 











x (2(0) 
(9.2) 00, 
oad ~ * a si. a . 2) . 
E 1 Bin |@)-E (2 nfs |@)) o| \ 
00, ) 
When ¥y(6) = 6 , (9.2) becomes, 
l 


es EE|(t — 6,)* | @) = —7——— — 
ree By E(n|@)E | (2 AE l*?) o| \ ; 
k | ) 


These results are valid for discrete as well as continuous distributions. 

A simple example of sequential estimation involves sampling from a binomial 
population until a specified number of successes, say r, occur. Here f(z | 6) = 
e(1 — 6)" *, x = 0,1, and E(n| 6) = r/@. Therefore, for ¥(@) = 8, 


2 1 
EE{(t — 6)*| 0) 2 ——_— 
§ (aa=5) 


This result corresponds exactly to that obtained in paragraph 7 for the Pascal 
distribution. 


10. Linear Estimation of £(@,). Consider m samples X; = (ti , 22, -*+ , Zin,), 
i = 1, 2,--- m, chosen from a population f(z | ‘6), which are randomly and 
independently chosen from a super-population of populations with frequency 
functions of the form f(z | 6). Thus for each sample X;, there is associated an 
unobserved random variable ‘6,7 = 1, 2, --- , m, with distribution G(‘@). We 
seek to find an estimate of E(@), say T;, , where 1 S k S s. It is supposed that 
for each sample there exists an unbiased estimate of ‘@, , namely ‘t, , and we re- 
strict our discussion to the set of 7, which are linear functions of the ‘t, , that is, 

T, = 2. Cs "ty 
t=1 

where (¢; , C2, °** , Cm) is a vector of real numbers. If we further restrict our- 
selves to unbiased estimates of E(6,), it follows that > c; = 1. 

The minimum variance unbiased estimate of E(@), 7 , is found by minimizing 


the expression, Var (7,) = >t c; Var (‘t), with respect to the c’s, subject 

to the restriction Pte c; = 1. This yields the normal equations: 

(10.1) é; Var (‘k) +X =0, i= 1,2,---,m, Dé =1, 
t=1 


where \ is a Lagrangian multiplier. 
Consider now the variance of the minimum estimate 7; , found by solving 
(10.1). We have 


(10.2) Var (T,) = > & Var (‘t). 
i=) 





CRAMER-RAO INEQUALITY 


By the result of the appendix, 

Var (‘t) = E [Var (‘t | ®)] + Var [E(t | 6], 
from which it follows that, 
(10.3) Var (‘t&) = EE[('t — ‘)? | 0] + Var ('&). 
So that (10.2) may be written 


m m 


(10.4) Var(T.) = >> GEE((‘ — ‘6,)*| 6] + Var (&) >. &. 
‘= t=] 


Applying (3.9) to (10.4), we have 
ae = anit 18 AGRE: Bees ernantihh r . 2 
(10.5) Var (T,) = d 8 In o( X; | 6) | + Var (%) 2) é, 
EE a 6 
= 


the equality being achieved under the conditions cited in paragraph 5. 
We may apply these results to the analysis of variance model II cited in 
paragraph 5. To simplify the normal equations above, let n; = n, i = 1, 2, --+ , m. 


> 


In the notation of this paragraph, @. = a;, ‘tk = & = (> 124;)/n, i = 1, 2, 
- , m, and (10.1) becomes é;((o%/n) + 0%) + A = 0, Diet é; = 1. Solving, 
we have ¢; = 1/m,i = 1, 2, --- , m. Thus 


a &2 


t= (S4)/m= (Ea) /m 


Var (7) = . & ot ot), 


m\n 
which equals the lower bound given by (10.5). 


11. Acknowledgements. The author wishes to express his appreciation to 
Professors C. W. Clunies-Ross and J. E. Freund for their suggestions. 


APPENDIX 
The Covariance in Terms of Conditional Expectations. 
Let U = (wm, w,-+:, us) and V = (, m%,---, v;) be random variables. 
Assume p = p(U, V) and g = q(U, V) have finite means and variances. Then 
we have 


(i) Cov (p, gq) = EE(p-q|V) — EE(p| V)EE(q|\V). 
But, 
Cov (p, q|V) = E(p-q|V) — E(p| V)E(q| V), 
from which, taking expectations with respect to V, we have, 


E (Cov (p, q| V)] = EE(p-q|V) — E[E(p| V)-E(q|V)). 





380 JOHN J. GART 


Substituting this result in (i) gives 
(ii) Cov (p, q) 

= E [Cov (p, q| V)| + E[E(p| V)-E(q| V)] — EE(p| V)EE(q| V), 
(ili) Cov (p, q) 

= E (Cov (p, q| V)| + Cov [E(p|V), E(q|V)l. QED. 


REFERENCES 
1} A. C. ArrKeN AND H. Sitverstone, “On the estimation of statistical parameters,’ 
Proc. Roy. Soc. Edinburgh, Sect. A, Vol. 61 (1942), pp. 186-194. 
Epwarp W. BaRANKIN, “Concerning some inequalities in the theory of statistical 
estimation,’’ Skand. Aktuarietids., Vol. 34 (1951), pp. 35-40. 
3] E. W. Barankin, “Locally best unbiased estimates,’’ Ann. Math. Stat., Vol. 20 (1949), 
pp. 477-501. 
4} A. BuHatracuaryya, “‘On some analogues to the amount of information and their 
uses in statistical estimation,’’ Sankhyd, Vol. 8 (1946), pp. 1-14. 
5|) Dovetas G. CHAPMAN AND HERBERT Rossins, ‘‘Minimum variance estimation with- 
out regularity conditions,’’ Ann. Math. Stat., Vol. 22 (1951), pp. 581-586. 
6] Haratp Cramtr, Mathematical Methods of Statistics, Princeton University Press, 
1946 
| Haratp Cramer, ‘‘Contributions to the theory of statistical estimation,’ Skand. 
Aktuarietids., Vol. 29 (1946), pp. 85-94. 
8] G. Darmors, “Sur les limites de la dispersion de certaines estimations,’’ Rev. Inst. 
Int. Statist., Vol. 13 (1945), pp. 9-15. 
9| CuurcutLt Ersennarpt, ‘The assumptions underlying the analysis of variance,”’ 
Biometrics, Vol. 3 (1947), pp. 1-26. 
[10] M. Frécuer, ‘Sur l’extension de certaines évaluations statistiques au cas de petits 
échantillons,’’ Rev. Inst. Int. Statist., Vol. 11 (1943), pp. 182-205. 
{11] J. M. HamMers.ey, ‘‘On estimating restricted parameters,’’ Jour. Roy. Stat. Soc. B., 
Vol. 12 (1950), pp. 192-240. 
{12} B. O. Koopman, ‘On distributions admitting a sufficient statistic,’’ Trans. Amer. 
Math. Soc., Vol. 39 (1936), pp. 399-409. 
[13] E. J. G. Prrman, “Sufficient statistics and intrinsic accuracy,’’ Proc. Cambridge Philos. 
Soc., Vol. 32 (1936), pp. 567-579. 
[14] C. RapHAKRISHNA Rao, “Information and accuracy attainable in the estimation 
of statistical parameters,’’ Bull. Calcutta Math. Soc., Vol. 37 (1945), pp. 81-91. 
M. P. Scuittrzenpercer, ‘A generalization of the Fréchet-Cramér inequality to the 
case of Bayes’ estimation’’ (Abstract), Bull. Amer. Math. Soc., Vol. 63 (1957), 
p. 142. 
[16] M. C. K. Tweepte, “Functions of a statistical variate with special reference to La 
placian distributions,’’ Proc. Cambridge Philos. Soc., Vol. 43 (1947), pp. 41-49. 
{17} M. C. K. Tweeptge, Notes on Generating Functions and Distribution Theory, Virginia 
Polytechnic Institute, 1953. 
[18] ABRAHAM WALD, Statistical Decision Functions, John Wiley and Sons, Inc., New York, 
1950. 
Javip VERNON Wipper, The Laplace Transform, Princeton University Press, 1946. 


NS 








[19] 


J 
{20} S. S. WiLks, Mathematical Statistics, Princeton University Press, 1950. 
[21] J. Wo._rowi1rz, ‘The efficiency of sequential estimates and Wald’s equation for se- 
quential processes,’’ Ann. Math. Stat., Vol. 18 (1947), pp. 215-230. 





ON THE ATTAINMENT OF CRAMER-RAO AND BHATTACHARYYA 
BOUNDS FOR THE VARIANCE OF AN ESTIMATE 


By A. V. Fenp 
Stanford Research Institute 


Summary. If a variable X has density function f(z, 6), then in many cases the 
Cramér-Rao bound or the Bhattacharyya bounds may be used to show that a 
function d(x) is a uniformly minimum variance unbiased estimate of the real 
parameter @. 

In this paper it is shown that if f(x, 6) is a member of the family of densities 
of the Darmois-Koopman form, and if the variance of d(x) achieves the kth 
Bhattacharyya bound, but not the (k — 1)th bound, then f(z, 6) = 
exp(t(x)g(@) + go(@) + h(x)] and d(x) is a polynomial in ¢(x) of degree k. 
Further, the variance of any polynomial in t(x) of degree k will achieve the kth 
bound, so that if any such unbiased polynomial exists, it will necessarily be 
uniformly minimum variance unbiased. Some properties of these polynomial 
estimates are discussed. 


Introduction. We will consider a one parameter family of density functions 
f(x, 0), 0 €Q, such that 


P{X € A} = [ H2,0) dul). 
A 


The variable X is possibly vector valued, as in the case where a random sample 
is observed, the set 2 is any set of real numbers, and y» is a measure 
independent of @. 

Conditions under which a Bhattacharyya bound is a valid lower bound for the 
variance of an estimate d(x) have been discussed in [1], [2], [4], and [5]. The 
conditions given by Wolfowitz [5] are for the sequential estimation problem. For 
the nonsequential case these can be written as 
(1) (a) Q is the entire real line, or an open interval of the real line. 

(b) d(a) has finite variance. 

(c) Both ff(x, 6) du(x) and fd(x)f(x, 6) du(x) are differentiable under 
the integral sign with respect to 6. Specifically, if @¢; = [1/f(2, @)] 
[a'f(x, 6)/06'|, then, for almost all x, ¢; exists for all @¢Q,7 = 1, --+ k. 
The exceptional sets of x’s do not depend on 6. Denote E[d(X )¢,] 
by A;. 

(d) The covariance matrix of ¢; exists and is non-singular for 6 ¢ Q. 

Then, if (1) is satisfied, 


0 
Ai E¢; -+++ Egy oe 


Ay Edy oi +++ Eg 
Received March 31, 1958; revised January 23, 1959. 


381 





382 A. V. FEND 


The right member of (2) is the kth Bhattacharyya bound. The Cramér-Rao 
bound is obtained by setting k = 1. 

We will assume that the regularity conditions in (1) are satisfied, for some k, 
for all of the density functions considered here. However, it is worth noting that 
in many special cases (c) of (1) will follow from the fact that a Laplace transform 
may be differentiated under the integral sign, or, in other cases, from the fact 
that term by term differentiation is permitted whenever Ed(X) is a finite sum, 
as in the binomial distribution. 


Results. Cramér [2] proved that if the equality in (2) holds for the case k = 1, 
then d(x) must be a linear function of ¢, . Girshick and Savage [3] defined an 
exponential family of density functions and showed, under certain restrictions, 
the existence of a function whose variance achieves the Cramér-Rao bound 
The same exponential family is considered in the following theorem. 

THEOREM 1. Jf the conditions in (1) are satisfied for f(x, 0) and d(x), and if 
oz > O for all 0, then a necessary and sufficient condition that a3 achieve the Cramér- 
Rao bound is that f(x, 0) = exp{d(x)g(@) + go(@) + h(x)], where g’(6) # 0. 

If the maximum likelihood estimate 6 is given by the root of the equation (0/08) 
In f(x, 0) = 0, and if d(x) is an unbiased estimate of 6, then, in addition, go(@) = 
—6g'(@) and d(x) = 6. 

Proor. We can, without loss of generality, write the density function of X in 
the form f(z, 6) = exp[u(2, @)]. Now, if the variance of d(x) achieves the Cramér- 
Rao lower bound, (2) becomes an equality, and this is a statement to the effect 
that the correlation coefficient of d(x) and ¢, is unity. That is, d(x) is a linear 
function of ¢; , except perhaps on a set of 1» measure zero, and we can write 


(3) d(x) = ao(0) + a,(@)dy 
where 

. dulz,6) _, 

= Oo w(x, 0) 


Since o3 > 0, it follows that d(x) is not a constant, so that a,(@) ¥ 0. Therefore, 
we can solve (3) for u’(z, 6) getting 


u(x, 0) = d(x)ay'(@) — ao(@)a;'(@) 
and u(x, @) d(x)g(@) + go(@) + h(x). 


To check sufficiency, we note that if f(x, 6) = exp[d(x)g(@) + go(@) + hA(x)], 
then ¢ = d(x)g’(@) + go(6) and ¢, is the linear function of d(x) given by (3). 
That is, the correlation coefficient of d(a) and ¢, is unity, and so the variance of 
d(x) achieves the Cramér-Rao bound. 

Now, observing that Eg, = 0, we get, from (3), Ed(X) = ao(@) + a,(@)Ed 
= ao(@). If d(x) is an unbiased estimate of 6, then ao(@) = @ and, substituting 
these values in (3), 


(4) d(x) = 0+ a,(8)[d(x)g’(6) + go()). 





CRAMER-RAO AND BHATTACHARYYA BOUNDS 383 


Since d(x) does not contain 6, it follows that a@(@) = [g’(@)!” and 
gol 6) = —6g'(@). To complete the proof of the theorem, we must consider the 
maximum likelihood estimate, 6. The coefficient of a,(@) in (4) is just the log 
of the likelihood function. Therefore, d(x)g’(@) + go(6) = 0. But since the right 
member of (4) does not contain @, we can substitute 6 for @ in (4) and obtain 
d(x) = 6. This completes the proof of the theorem. 

Theorem 1 raises two important questions. First, we showed that if the vari- 
ance of d(x) achieves the Cramér-Rao bound, then the density function 
is given by 


(5) f(x, 0) = exp [d(x)g(@) + go(@) + h(x)). 


For such a function, we might now investigate the possibility that the variance 
of some function of d(x) might achieve one of the Bhattacharyya bounds, even 
though it does not achieve the Cramér-Rao bound. It also seems possible that 
if the exponent in (5) can be expressed as a polynomial in d(x), with coefficients 
depending only on @, then some function of d(x) might achieve one of the higher 
bounds. This point is covered by Theorem 2. 

A second question is concerned with maximum likelihood estimates. 
In Theorem 1 we showed that if the variance of an unbiased estimate d(x) 
achieves the Cramér-Rao bound, then d(x) is both minimum variance unbiased 
and maximum likelihood, and we might now ask if this result holds for the 
Bhattacharyya bounds. The answer is that it does not. 

To illustrate this, suppose that the variance of the unbiased estimate d(z) 
achieves the second Bhattacharyya bound, but not the Cramér-Rao bound. In 
this case, (2) becomes an equality, and the multiple correlation coefficient of 
d(x) and ¢; and @¢» is unity. That is, we can write 


(6) d(x) = ao(@) + a;(0)di + a2(O)dro . 


Now for any k, Ed = 0, so that in taking expected values of both sides of 
(6), we get Ed(X) = ao(@). If d(x) is unbiased, ao(@) = 06. Substituting the 
maximum likelihood estimate 6 for @, as in the proof of Theorem 1, (6) becomes 
d(x) = 6 + a2(6)¢o(6). In general, the term a2(6)¢2(6) will not vanish. That is, 
if 6 maximizes the likelihood function, then 6 will not be a solution of 

2 
ge: = lof = QQ. 
Hence, we would expect that the minimum variance unbiased estimate would 
not be the same as the maximum likelihood estimate. 

If d(x) is a sufficient statistic, then the maximum likelihood estimate is neces- 
sarily a function of d(x), and the same thing may be said of any estimate whose 
variance achieves the kth Bhattacharyya bound. For such an estimate d,(z), 
(2) becomes an equality, and we write d;(x) = ao(@) + dai 0); . The state- 
ment that d;(x) is a function of d(x) will be proved if we show that ¢; depends 
only on d(x) and @. 





384 A. V. FEND 


Suppose that the density in question satisfies Neyman’s criterion for suffi- 
ciency. That is, f(z, 6) = h[d(x), @\g(x). Then 
g, = 125 _ Wath _ 1 ah 
"  f 0 hgae hag’ 
and ¢; is a function of the sufficient statistic d(x). 
THEOREM 2. Consider a density function of the form 


f(x, @) = exp > [w(a)]**gs(0) + v(x) } 


where the real numbers a;,i = 0 --- n satisfy the conditions 0 = ayo < a <--> 
< an, and Gini 6) # 0. If the regularity conditions in (1) are satisfied for an estimate 
of 6, d(x), and the density f(x, 0), if oz > 0 and aj achieves the kth Bhattacharyya 
bound but not the (k — 1)th bound, then the density may be expressed in the form 
f(z, 0) = explt(x)g(@) + go(@) + h(x)] and d(x) is a polynomial in t(x) of 
degree k. Further, the variance of any polynomial in t(x) of degree k will achieve 
the kth bound. 
Proor. In order to simplify notation, we define a general function 


Pyfu(a)) = [u(x)]W(0) + Siri (u(x) W (8), 


where the real numbers }; and b satisfy the conditions 0 < b; S b,71 = 1-:- 
(r — 1). In the expression P,[u(x)], we will not be concerned with the particular 
values of r, or W,(6@),7 = 1--- r. In fact, we do not exclude the possibility that 
W,(0) = 0. 

Notice that if the numbers b; and b, i = 1--- (r — 1) are integers, and if 
W,(@) # 0, then P,[u(x)] is just a polynomial in u(x) of degree b, with coeffi- 
cients depending on @. 

Now, for the density function f(a, @) = exp{ >_> feo [u(a)]"*g:(@) + v(x)}, we 
will show that 


n h 
on = { » fu(2)]*gi(0) } + Pa,cr—y (u(x). 


\ ) 


For any integer value of h, h 2 1, 


oj! af ce af 1d], 19F 

060 f 06 f d0\Lf ae f 00 
and, using the definition of ¢; in (1), we get the recursion relation ¢, = 1 + 
dif - Now, 


; ( n " 2 n - 
do = di + oi = { Dd fu(x)]**gi (8) \ + 2 [u(x )\**g; (8) 
\ t=O tS 


so that @2 is of the form given in (7). 





CRAMER-RAO AND BHATTACHARYYA BOUNDS 385 


Suppose next that ¢,_, is also given by (7). Then, using (7) and the recursion 
relation, 


go, = (h— 1) { > iu(2)I"44(0) | 5 { > iu(2)196() | + + Pe.o-nlu(z)] 
+ { ¥ Iulz)Igh(0) } (x [u(x)9i(0)) Sail Pa,o-otu(2))} 


n h 
= { i u(x I"9i(0) | + Pa,a-ylu(z)). 
t=0 
That is, ¢ is given by (7), and so (7) holds for all positive integral values of h. 
If d(x) is an estimate such that 03 > 0 and o% achieves the kth Bhattacharyya 
bound but not the (k — 1)th bound, then, by the same reasoning that led to 
(4), we can write 


(8) d(x) = ao(@) + Dia a4( 09: , 


where a,(@) # 0. 
Now, substituting (7) into (8) we get 


y 


k n i 
(9) d(x) = ao(@) + > a;(@) (= iu(x)"*95(0)) + P.,i-»lu(z)) 
i=l j=0 / 
Notice that the right side of (9) can be expressed as a sum of terms, each term 
being a power of u(x) multiplied by a coefficient which does not depend on z. 
But d(x) itself is free of 6, so that the functions a,;(@) must be chosen so that 
the coefficients in this series are free of 6. 
Suppose now that (9) is expressed in descending powers of u(x). We will pay 
special attention to the expression 


( _» k 
(10) ax(6) 4 2» iu(2)1"95(0) } 


because this contains the higher powers of u(x), and hence the first terms of the 
expansion in descending powers of u(z). 

Recalling that 0 = ay < a < --- < a@,, we observe that, the first term in the 
expansion is 


ax(6)[u(x)]**"Ign(0)]*. 


kan 


Since the coefficient of [u(z)}"“" is independent of 6, it follows that a,(@) = 
C,{g.(8)), where C,, is a constant. Substitute this value of a,(@) into (10), 
and (10) becomes 


( n 7 k 
(11) Cn4 » [u(ar)]* ao | ‘ 


The next term in the expansion of (9) in descending powers of u(x), whose 





386 A. V. FEND 
coefficient might depend on @, is obtained from (11). It is 


gn-s(8) 1(8) an(k—1) +an_ 1 
Cr 76) (u(x)] 


Again, the coefficient must be independent of 6, so that Gn- 1(0) = Cy. 19.(9)> 
where C,_; is a constant. 
Substituting this value in (11), we - 


(12) Cn tu(2)I" + (u(a)|*- 10 + [u(x]? ees + sf 


We see that the next term in the expansion whose coefficient might depend 
on @ is 


(8) 


and this implies that 9n-2(0) = (, 29n() where C’;,_2 is a constant. 
Repeating this procedure on the remaining functions, it follows that 


{Constant oa C. teal) fu(a)jo"*? tan? 


(13) g;(0) = Cign(0), jg=1,°---,n—1. 


But ¢ = >>}.o[u(x)]**g;(@), and substituting (13) in this expression, 


n—1l 
= 2 [u(x)]**g,(0)C; + [u(x)]*gn(0) + go(0) 
(14) = 


n—l 
= g,(8) > [u(x)|'C; + iu(a)im} + go(0). 


In (14), let t(z) = 2 sak [u(x)]* iC; + [u(x)]*™ and Jn (6) = g’(@), and we can 
say that @ = t(x)g’(@) + go(@). Since ¢, is the derivative Z the exponent of 
the density function stated in the theorem, we get 


f(a, 0) = exp [t(x)g(@) + go(@) + A(x) 


and this concludes the proof of the first part of the theorem. 
If f(z, @) is of the above form, then (7) becomes ¢, = [t(x)g’(@) + go( 4) p+ 
P,_a[t(2)], where P;_,[t(2)] is a polynomial in é(z) of degree h — 1 with co- 
efficients depending only on 86. 
Using this result and (8), 


. 
(15) d(x) = a(@) + > a;( 0) {{t(x)g’ (0) + go(0)|" + Px[t(r)}} 
i=l 


so that d(x) is a polynomial in ¢(xz) of degree k. 

Finally, we will show that if d(2) is any polynomial in ¢() of degree k, then 
the functions a;(@),7 = 1, ---, & can be chosen so that (15) holds. To do this, 
let Pialt(x)| = sist (x)u;;(@). As stated before, we will not be concerned 





CRAMER-RAO AND BHATTACHARYYA BOUNDS 387 


with the particular form of u;;(@). Substituting this expression in (15) we get 


k ai-l 
d(x) = ao(6) + 2 a;(0) {tt + gol’ + X Hush 


ao(0) + 2 a;(@) {e (‘) (tg’)*(go)*? + = tus 
= ao(0) + dX a;(6) {(w’y + "| () (9')?(go)"? + us|} 
= ltg’'a,(6) + ++: 


k > 
+ t” {,(0)(¢) + 2. a;(0) [@) (g’)"(go)* a ws} + eee 


‘=n 


+ ao(0) + a [(go)* + wiolas(0). 


Choose arbitrary constants, Co, Ci, --- Cy, and let a,(@) = Cylg’(@)]*. For 


0 <n < k, let 
k . 
C.— DY a:(6) |(‘) (9')"(g0)* + vi | 
a,(@) ai —_— seri | f 
(g')" 





k 
a(é) = Cy — 2 [(go)' + upja;(@). 


Using these functions in (15) we get d(x) = >i Ct'(x). Since the constants 
Co, ++: , Cy are completely arbitrary, it follows that any polynomial in t(x) of 
degree k can be written in the form (15). This completes the proof of the last 
part of Theorem 2. 

Theorem 2 provides a method for finding uniformly minimum variance un- 
biased estimates. That is, if f(z, @) is of the form described in Theorem 2, then 
we look for an unbiased polynomial in t(x). If this polynomial is of degree k, 
then its variance achieves the kth Bhattacharyya bound and it is the best un- 
biased estimate. 

On the other hand, we know that if the variance of d(x) achieves the kth 
bound, then d(x) is necessarily a polynomial in t(z). Therefore, if we find that 
no such polynomial is unbiased, there would seem to be no value in calculating 
any of the Bhattacharyya bounds if we are interested only in minimum variance 
unbiased estimates. The following examples illustrate uses of Theorems 1 and 2. 

EXAMPLE 1. Let X have density @“’” exp[— 20°”, 0 < 2, 0 < 6 where 
n is a positive integer, and suppose that an estimate is wanted for @. We write 
the density as exp [—z2@“’”” — (1/n) log 6], and, in the notation of Theorem 2, 
t((x) = 2, g9(@) = —@"”, and g(@) = —(1/n) log@. If n = 1, then 
gol 6) = —6g'(@), so that from Theorem 1, the estimate X is minimum variance 
unbiased. 





388 A. V. FEND 


If n > 1, the estimate x"/n! is an unbiased estimate of 0, and, since it is a 
polynomial in x of degree n, it follows from Theorem 2 that the variance will 
achieve the nth Bhattacharyya bound. Hence, it is minimum variance unbiased. 

By straightforward calculations we find that the maximum likelihood estimate 
is 6 = x”. If n = 1, 6 is unbiased, as stated in Theorem 1, but if n > 1, then 
6 is a biased estimate. 

Examp.e 2. Let X have density 0" exp[—2zé@ "|, 0 < 2,0 < 6, where n is 
an integer, nm > 1, and an estimate is wanted for 6. Writing the density as in 
Theorem 2, exp |—2z@" — n log 6]. Now from Theorem 2, if the variance of an 
estimate d(x) achieves the kth bound, d(z) is a polynomial in z. But for this 
density we have, for any integer c, EX° = c!6"°. Therefore, no polynomial in x 
is an unbiased estimate of 6. This does not imply that no minimum variance 
unbiased estimate exists, but it does mean that the variance of any unbiased 
estimate will not achieve the kth bound. 

REFERENCES 

{i] A. Boatracuaryya, “‘On some analogues of the amount of information and their use in 
statistical estimation,’’ Sankhya, Vol. 8 (1946), pp. 1-32. 

{2} H. Cramér, Mathematical Methods of Statistics, Princeton University Press, Princeton, 
1946. 

(3] M. A. Grrsuick, and L. J. Savage, “Bayes and minimax estimates arising from quad- 
ratic risk functions,’’ Proceedings of the Second Berkeley Symposium on Mathe- 
matical Statistics and Probability, University of California Press, 1951, pp. 
53-73. 

[4] G. R. Sern, ‘‘On the variance of estimates,’’ Ann. Math. Stat., Vol. 20 (1949), pp. 1-27. 

[5] J. Woirowrrz, ‘The efficiency of sequential estimates, and Wald’s equation for se- 
quential processes,’’ Ann. Math. Stat., Vol. 18 (1947), pp. 215-230. 





THE LAGRANGIAN MULTIPLIER TEST 


By 8S. D. Siivey 
University of Glasgow 

1. Introduction. One of the problems which occurs most frequently in prac- 
tical statistics is that of deciding, on the basis of a number of independent ob- 
servations on a random variable, whether a finite dimensional parameter 
involved in the distribution function of the random variable belongs to a proper 
subset w of the set 2 of possible parameters. Naturally this problem has re- 
ceived considerable attention and the main method which is currently applied 
in dealing with it is the well-known Neyman-Pearson likelihood ratio test. 
Direct application of this test involves finding the supremum of the likelihood 
function in the set w and this in turn often involves the solution of restricted 
likelihood equations containing a Lagrangian multiplier. And the same set of 
of equations has to be solved if, irrespective of the likelihood ratio test, it is 
desired to obtain a maximum likelihood estimate in the set w of the unknown 
parameter. Rather surprisingly, since the problem is of such frequent occur- 
rence, little seems to have appeared in statistical literature on such restricted 
maximum likelihood estimates, the main results in this field being cont.ined 
in a recent paper by Aitchison and Silvey [1]. 

In this paper the authors introduced, on an intuitive basis, a method of 
testing whether the true parameter does belong to w, this method being based 
on the distribution of a random Lagrangian multiplier appearing in the re- 
stricted likelihood equations. It is the object of this present paper to discuss 
this Lagrangian multiplier test. In order to do so, it is necessary to consider 
how the results of the previous paper must be modified when the true parameter 
does not belong to the set w, because only in this way can we obtain any notion 
of the power of the test. Discussion of this point forms the initial part of the 
present paper. We will then show the connection between the Lagrangian mul- 
tiplier test and the likelihood ratio test. Finally, since often in practice situations 
arise where the information matrix is singular, we will consider how the Lagran- 
gian multiplier test must be adapted to meet this contingency. 

The approach adopted by Aitchison and Silvey [1] in the discussion of re- 
stricted estimates is essentially Cramér’s approach [4] to maximum likelihood 
estimates, i.e., attention is concentrated on solutions of the likelihood equations 
rather than on genuine maximum likelihood estimates. Such an approach is 
really unsuitable in the present instance where we do not necessarily assume 
that the true parameter does belong to the subset w. And we will use instead the 
method used by Wald [7] in his discussion of the consistency of maximum like- 
lihood estimators. As has been pointed out by Kraft and Le Cam [5), Wald’s 
approach to unrestricted maximum likelihood estimation is much more illumi- 


Received March 14, 1958. 


389 





390 Ss. D. SILVEY 


nating than that of Cramér and, not surprisingly, this is still true of restricted 
estimation. Unfortunately the change in viewpoint necessitates certain changes 
in the notation used by Aitchison and Silvey, and these we will now introduce 
in describing mathematically the situation to be discussed. 


2. Notation. The basic situation in which we shall be interested is described 
mathematically as follows. 


Corresponding to each point 6 = (0, 62, ---, 6.) in some subset & of s-di- 
mensional Euclidean space, denoted by R’, is a distribution function F(-, @) 
defined on R*, where a is some given integer. A random variable X, taking values 
in R* has distribution function F(-, 4) where 4 is known to belong to 2 but is 
otherwise unknown; though it is suspected that % belongs to a subset w = 2/N 
{0:h(@) = O} of Q, where hh = (hy, he, --- , hy) is a well-behaved function from 
R’ into R’, r < s. 

We will assume, as is usual, that for all @ ¢ 2, F(-, @) is either discrete or ab- 
solutely continuous, and admits an elementary probability law f(-, 6). Then 
for a given sequence = (%, %2,°*:, 2%n,°-*:) of independent observations 
on X, the log-likelihood function log L,(2,-) is defined on 2 by log L, 2,0) = 
> 2.1 log f(a;, 6). By a maximum likelihood estimate of % in any subset w* of 
Q, we mean an element 6,(x, w*) of w* which is such that 


log Ln(x, 6n(x, w*)) = sup log L, (2, 4). 
6ew* 


If a single-valued function 6,(-, w*) is thus defined for almost all x, then 6,(-, *) 
is a random variable called a maximum likelihood estimator of % in w*. When 
we refer to “almost all x’? we mean almost all with respect to the probability 
measure defined on the sequence space of points x by the consideration that the 
components of a sequence x are regarded as independent observations on a 
random variable X with distribution function F(-, 4). Similarly “almost all 
t ¢ R°” means almost all with respect to the probability measure defined on 
R* by F(-, 4). 

The matrix whose (i, j7)th element is x= dlog f(t, 6)/00;-dlog f(t, 6)/90; 
dF(t, 6), we will denote by B,. Further, Hy will denote the s X r matrix 
(dh;(@)/80;). For any real function ¢ defined on R*, D¢(6) will denote the col- 
umn vector whose ith component is 0¢(@)/80;, while D’¢(@) will denote the 
s X s matrix whose (7, j)th component is 0°¢(@)/00,00;. Generally column 
vectors corresponding to points in Euclidean space will be printed in the corre- 
sponding boldface type so that, for example, the column vector 6 corresponds 
to the point @. 

We will be interested initially in the emergence of 6,(x, w) as a solution of the 
equations 


nD log L, (2, 0) + Hea = 0 
h(@) = 0, 





LAGRANGIAN MULTIPLIER TEST 391 


where A is a Lagrangian multiplier in R’, and generally in the restricted maxi- 
mum likelihood estimator 6,(-, w). 


3. 6,(x, w) and the likelihood equations. Naturally the discussion on which 
we have embarked will involve the introduction of various assumptions con- 
cerning F and h. The assumptions that we will introduce are not designed to 
achieve complete mathematical generality but are, we hope, of such a nature 
that they will not obscure the over-all mathematical picture and will be satisfied 
in many practical problems. The first of these assumptions is as follows. 

Assumption 1. For every @¢, 2(0) = Sra log f(t, 6) dF(t, 0) exists. 

The whole problem of maximum likelihood estimation, restricted and un- 
restricted, is closely bound up with the behaviour of the function z, because the 
Law of Large Numbers ensures that, for each @, the sequence (n™ log L(x, @)) 
converges, for almost all x, to z(@). If, further, this convergence is uniform with 
respect to @, then for large n and most x, n™ log L,(z,-) will be uniformly near 
z and under suitable conditions will attain its supremum in w near the point (if 
such exists) where z attains its supremum in w. The assumptions which we will 
now introduce are designed to achieve this desirable situation. 

Assumption 2. Q is a convex compact subset of R’*. 

Assumption 3. For almost all t ¢ R*, log f(t,-) is continuous on 2. 

Assumption 4. For almost all t ¢ R“, and for every 6 ¢Q, 9 log f(t, @)/00; 
(¢ = 1, 2,---, 8) exists and |0 log f(t, 6)/00;| < g(t)(¢ = 1, 2,---, 8) 
where fra g(t) dF(t, 0) is finite. 

Assumption 5. The function h is continuous on 2. 


Assumption 6. There exists a point 6* ¢ w such that z(6*) > 2(@) when @ ¢ w 
and 6 # @&. 


Assumptions 2-4 ensure that for almost all x the sequence (n™ log L,(z, @)) 
converges to z2(@) uniformly with respect to @ in the set 2. Assumptions 2 and 5 
ensure that w is a compact subset of R* and consequently that any continuous 
function on w attains its supremum at some point of w. In particular the function 
log L,.(a,-), for almost all 2, attains its supremum in w at some point 6,(z, w) 
of w. Assumption 6 then ensures that for almost all x the sequence (6,(x, w)) 
converges to 6*. The proofs of these results are fairly straightforward and we 
omit them. 

It is of some interest to note that if @ ¢w then usually % will satisfy the 
condition demanded of 6*. This has been proved by Wald [7]. In fact, when 
interest is concentrated on the case where 4 ¢ w, Assumption 6 may be replaced 
by the following 

Assumption 6A. 0 € w and if 6 # 6 then for at least one t ¢ R*, F(t, 0) # 
F(t, 6). This is sufficient to ensure that z(@.) > 2(@) if 0 # 4%. 

As stated above, Assumptions 1-6 ensure the existence of a maximum likeli- 
hood estimator in w of 6) which converges with probability one to 6*. If in addi- 
tion we make the following Assumption 7 then for large n and most 2, 6,(2, w) 
will be an interior point of w and consequently will emerge as a solution of the 
restricted likelihood equations, when the function h is differentiable. 





392 S. D. SILVEY 


Assumption 7. 6* is an interior point of w. Now making assumptions 1-7, we 
will use these likelihood equations in discussing the asymptotic distribution of 
6,( +, w). 


4. The asymptotic distribution of 6,(-, w). The method by which the asymp- 
totic distribution of maximum likelihood estimators is usually derived, for ex- 
ample by Cramér [4], involves expanding the likelihood function by Taylor’s 
Theorem. In order that we may adopt this method in the present instance we 
now introduce the following assumptions, similar to those of Cramér. 

Assumption 8. The functions h; possess first and second order partial deriva- 
tives which are continuous (and so bounded) on Q. 

Assumption 9. For almost all t ¢ R* the function log f(t,-) possesses con- 
tinuous second order partial derivatives in a neighborhood of 6*. Also, if 6 be- 
longs to this neighborhood, then |d° log f(t, @)/d0,0;| < G,(t) (i, 7 = 1, 2, 

- , 8) where fre G,(t) dF(t, 6) is finite. 

Assumption 10. For almost all ¢ ¢ R* the function log f(t, -) possesses third 
order partial derivatives in a neighborhood of @* and, if @ is in this neighbor- 
hood, then 


\a° log f(t, 0)/30,00;00,| < G2(t) (i,j,k = 1,2, ---, 8), 


where fre G2(t) dF(t, %) is finite. 


(4.1) Important implications for our purposes of Assumptions 4, 9 and 10 
are as follows. 


(4.1.1) The vector Dz(@) exists for every @ ¢ @ and the sequence (Dn log 
L(x, 8)) of vectors converges for almost all x to Dz(@) (Assumption 4). 


(4.1.2) The matrix D’2z(6*) exists and the sequence (D’n™ log L(x, 6*)) 
of matrices converges for almost all x to D*z(@*) (Assumption 9). 


(4.1.3) For almost all x and 7, j, k = 1, 2, --- , s the sequence (n™a® log 
L(x, 0) /06,00;00,) is bounded uniformly with respect to @ in a neighborhood 
of 6* (Assumption 10). 

Each of these three statements is almost a direct consequence of the Strong 
Law of Large Numbers. 

We are now in a position to obtain the asymptotic distribution of 6,(-, w). 
For brevity we will now write 6 instead of 6,(x, w). Since 6 — 6* for almost all 
x, we find by applying Taylor’s Theorem and using (4.1.2) and (4.1.3) that 


(4.1.4) Dn™ log L(x, 6) = Dn™ log L,(a, 6*) + [D°z(6*) + 0(1)] [6 — 6*] 


for almost all x. 
Also because of the continuity of the first partial derivatives of the functions 
h; , for almost all 2, 


(4.1.5) Hg = He + o0(1) 





LAGRANGIAN MULTIPLIER TEST 
and 


(4.1.6) h(6) = [Ho- + 0(1)}[6 — e*1. 

For almost any z, if n is sufficiently large, 6 will, with a certain Lagrangian 
multiplier A,,(2), satisfy the restricted likelihood equations. So we have, writing 
\ in place of A,,(2) for brevity, 
(4.1.7) Dn™ log L,(x, 6*) + [D*z(6*) + 0(1)][6 — 6*] + Hai = 0, 


(4.1.8) (Ho. + o(1)][6 — o*] = 0. 


Since z(6*) is a maximum in the set w of the function z, there exists a Lagran- 
e ° . ro * * 
gian multiplier A* = (Ay, As, 


-+, X*) such that 
(4.1.9) 


Dz(6*) + Hyra* = 0, 
and on subtracting (4.1.9) from (4.1.7), and using (4.1.5) we obtain 


[Dn™ log L, (x, 6*) — Dz(6*)] + [D*z(6*) + 0(1)][6 — 6*| 
(4.1.10) 


+ [He + o(1)}[% — a*] + [Hs — He-Ja* 


Now on expanding the elements of the matrix Hg by Taylor’s Theorem, we find 


that, because of the continuity of the second order partial derivatives of the func- 
tions h; , for almost all z, 


(4.1.11) (Hs — Hy-Ja* = b A? D*h;(0*) + o(1) | (6 — 0%}. 
i=l 


We will denote by —Bys. the matrix D*2(@*) + > tas ATD*h,(0*). Then on 


substituting in (4.1.10) the expression for [Hs — He-]* contained in (4.1.11) 
we have 


4.1.12 [Bes + 0(1)}[6 — 6*] — [Hoe + 0(1)][4 — at] 
(4.1.12) 


= Dn™ log L, (2, 6*) — Dz(6*), 
and combining (4.1.12) and (4.1.8) we may write 


aang — H,. + 0(1) | [6 — 6* 
| —Hy. + o(1) Oo 
(4.1.13) 


i —2* 


| oO 


We will now make the final assumptions which enable us to derive the asymp- 
totic distribution of @,(-, w) and X,(-). 
Assumption 11. The matrix 


a pam log L,,(x, 6*) ey 


Be: mo 
iin. © 


is non-singular. 





394 S. D. SILVEY 


Assumption 12. For i, j = 1, 2,---, 8, Bi;(0*) = fred log f(t, 6*)/00,- 
8 log f(t, 6*)/00; dF (t, %) exists. 


We now define 
re Qj. B}. —H,T° 
Cae Cee aye 


and Va = (8;;(6*)) — [Dz(é*)|[Dz(é*)|’. By the multivariate form of a Cen- 
tra] Limit Theorem (Cramér [3]) it follows from the existence of the matrix 
Vo. that the distribution of +~/n[Dn™ log L,(-, 6*) — Dz(6*)] is asymptotically 
normal with mean 0 and variance matrix Vs. Then from (4.1.13), by the multi- 
variate extension of a theorem of Cramér [4] we have the results stated in the 
following lemma. 

Lemma 1. Under Assumptions 1-12 the random vector 


Vn vey a -y 


An( +) — 4* 


ts asymptotically normally distributed with mean 0 and variance matrix 


es Vo+ Phe Phe Vor a 
bs Vow Phe Qh: Vox Qh 


We have now obtained a formal result regarding the behavior for large n 
of the restricted maximum likelihood estimator, a result which might be used in 
most practical situations to determine the large sample power function of the 
test of the hypothesis that % ¢w, proposed by Aitchison and Silvey. (This 
might involve a considerable amount of computation). The extent to which the 
method of solving the likelihood equations which is proposed in the same paper 
can be used when 6 gw remains obscure, as does any general picture of the 
power of the test. However some light is shed on these questions by considering 
how the results here obtained particularize in the case when @ ¢ w. 


(4.2) Accordingly we consider what happens when we replace Assumption 
6 by Assumption 6A. Then 4 replaces 6* and z(@), the maximum of z in the set 
w, is also the maximum of z in the set 2. Hence Dz(6*) = 0 and 4* = 0. The 
matrix Vs becomes the matrix By, and, with the mild additional assumption 

Assumption 13. [re d°f(t, 0) /80,00; dt = 0 (i, j = 1, 2,-+-, 8), the matrix 
B}- also becomes By,. Consequently we have exactly the result of the previous 
paper [1] concerning the asymptotic distribution of the restricted estimator and 
the corresponding Lagrangian multiplier. The assumptions made here in deriv- 
ing this distribution are, so far as comparison is possible, stronger than the 
assumptions of the previous paper, but we have now obtained a result concern- 
ing the genuine maximum likelihood estimator rather than merely a solution of 
the likelihood equations. (A greater degree of similarity between the two sets 





LAGRANGIAN MULTIPLIER TEST 395 


of assumptions is apparent if we note that in the case where % ¢ w we might 
replace Assumption 11 by the following 


Assumption 11A. The matrix B,, is positive definite and the matrix Hy, is of 
rank r). 


(4.3) It is now possible to obtain a picture of the typical practical situation 
when 7 is large and 6, while not belonging to the set w, is very near this set. 
Usually then 2(@) will be sups.e 2(@) and 6* will be near % so that Dz(é6*) will 
be near Dz(@) = 0. Then d* also will be near 0, though, since n is large, ~/nd* 
may be appreciably different from 0. Also the elements of D*z(6*) will be near 
those of D’z(@)) = —Bs,. If in addition B,;(6*) is near the corresponding element 
of Bs,, as will usually be the case, then we can say that approximately 


Ke eae : 


dn( +) — 3* 


will have a multivariate normal distribution with mean 0 and variance matrix 


on 
O —R,,}’ 
this matrix being as defined in [1]. (It would be possible to give a rigorous mathe- 
matical derivation of this result by imagining the true parameter 4 to vary with 
n in such a way that the distance of % from the set w tended to 0 asn — ~, 
and by imposing suitable restrictions on the functions f and A to ensure that 


what is here said to happen usually would in fact happen. But this does not 
seem particularly profitable). 


(4.4) Finally in this connection, because of the remarks made in the pre- 
vious paragraph and of the flexibility of Newton’s method of solving equations, 
we might expect that, in the case where 4% is near the set w and n is large, the 
iterative method of solving the restricted likelihood equations suggested in [1] 
will still apply. 


5. Three tests of the hypothesis that 4 ¢w. We will now compare three in- 
tuitively reasonable tests of the hypothesis that 4 ¢ w. These are as follows. 

(i) The likelihood ratio test. We accept the hypothesis if u(x) = supe... L,(z, 
6)/supe.n La(x, 6) is “sufficiently near’ 1. 

(ii) The Wald test. Assuming the existence of 6,(x, 2), we accept the hypothe- 
sis if h(6,(2, 2)) is “sufficiently near” 0. (Wald [8}). 

(iii) The Lagrangian multiplier test. Assuming the existence of 6,(z, w) and 
h,(x) we accept the hypothesis if 4,,(a) is “sufficiently near” 0. (Aitchison and 
Silvey [1]}). 

For typographical brevity we will now write 6 for the unrestricted maximum 
likelihood estimator 6,(-, 2), 6 for the restricted maximum likelihood estimator 
6,(-, w) and i for the random variable 4, (-). 

The measure of the distance from 0 of h(6) used by Wald is, in our notation, 








396 Ss. D. SILVEY 


—n\h(6)|'Re{h(6)]; his test is based on this random variable and he has shown 
that under general conditions the asymptotic distributions of —2 log » and 
—n{h(6)]/Refh(6)}] are the same. The measure of the distance from 0 of \ in the 
test proposed by Aitchison and Silvey is —nd’R7'4. We will now show that 
subject to the following assumptions A we have 

plim 2 log » = plim n{h(6)|/Rsfh(6)] = plim ni’R7F"S. 

Assumptions A. By assumptions A we mean the following set of assumptions :— 
1—5, 6A, 7-10, 11A, 13 and 

Assumption 12A. The matrix By exists in a neighborhood of 4 , and its elements 
are continuous functions of 6 there. Of course when assumption 6A is made, 
6* is replaced by 4 in subsequent assumptions. 

We have already seen that these assumptions imply that 6 exists and almost 
certainly converges to 4%, and that for large n and most z, X,(2) exists. It is 
not difficult to use the particular form to which (4.1.13) reduces when assump- 
tion 6A replaces assumption 6 to obtain the results 


(5.1) V/n (6 —- 85) => n 'P, D log Lal < Oo ) + Op ‘3. 
(5.2) Jn. = n'Q¢,D log L(+, %) + 0(1). 
Here 0, is used in the sense of Mann and Wald [6] and Py,, Qs, are defined by 


Ba, e oe be 
(5.3) } = : 
— He, O Qe, Ry, 


Also it is easy to show by the same kind of argument as has been applied above 
that the assumptions A imply that 6 exists and almost certainly converges to 6 
and that 


(5.4) for almost any z and sufficiently large n, 
D log L,[x, 6(x)] = 0, 
(5.5) a/n(6 — 0) = n*Bs{D log La(-, %) + 0,(1). 


We will now use these results to prove the following lemmas. 
LemMa 2. Subject to assumptions A, 


—2 log wu = n(6 — 6)’Bs,(6 — 6) + 0,(1). 
Proor. Clearly from (5.1) and (5.5) ||@ — 6|| = O,(n-*). Hence on expand- 
ing log L,(-, 6) by Taylor’s Theorem, we have, in virtue of (4.1.3) and (5.4) 
log L,n(-, 6) = log Ln(-, 6) + 3(6 — 6)'[D* log L,(-, 6)}[6 — 6] + 0,(1). 


Again from Taylor’s Theorem we have n'D* log L,(-, 6) = nD’ log L,(-, 
6) + 0,(1), and from (4.1.2) and assumptions 9 and 13 (which imply D’z(@) = 
—B,,) 


nD* log La(-, 0) = —B,, + 0,(1). 





LAGRANGIAN MULTIPLIER TEST 
Hence 
log » = log L,(-, 6) — log La(-, 6) 


—4n(6 — 6)’[Bo, + 0,(1)](6 — 6) + 0,(1), 


and the result follows because || — 6|| = O,(n™). 
Lemma 3. Subject to assumptions A, 2 log w = nd'R5'4 + 0,(1). 
Proor. We have 


Vn(6 — 6) = n*(Py, — Bo, )D log Ln(-, 6) + o9(1). 


Now 


[Pe, re Bs, |Bo,[Ps, Bs Ba | = Ba, ~ Ps, - —Qe.Ro, Qe, 


these matrix relationships following easily from the definition of Py, , Qe, and 
R,, in (5.3). Hence 


n(6 — 6)'Be,(6 — 6) = —n"[D log L,(-, 60)]’Qe,Re,Qe[D log La(-, O)] 
+ o,(1) 


= —nd/Ro 4. + 0,(1), by (5.2). 


Since, according to assumption 12A, the elements of the matrix B, are continu- 
ous functions in a neighborhood of % , and by 11A Beg, is positive definite, By 
will also be positive definite in a neighborhood of 4 . Similarly Hg is of rank r 
in a neighborhood of @ and so the matrix Ry exists and its elements are con- 
tinuous functions of 6 in a neighborhood of 6 . It follows from the strong conver- 
gence of 6 to @ that Rj’ = Rs, + 0,(1), and this completes the proof. 

Lemma 4. Subject to assumptions A, 2 log » = n{h(6)]|/Ri{h(6)] + 0,(1). 

Proor. Since the second derivatives of the functions h; are bounded on Q 
(assumption 9) and since ||@ — 4 || is O,(n™), we have 


h(6) = h(0) + Ho,(@ — @) + O,(n'") 
= Hy,(6 — &) + O,(n"), 


since by 6A, 4 ¢ w. Hence v/nh(6) = n*Hy,Bs,D log L(+, 00) + op(1) and 
n{h(6)|/Re,{h(6)] = n-'[D log L,.(-, 6) \’Bs. Hy,Re,Ho,Bs, (D log L,.(-, %)] + 0,(1). 
It is easy to show that Bj, H,,Ro,Hs,Ba, = Q,,Rs/Qs,, and it follows that 
n{h(6)]/Re,[h(6)] = nd’Rs,4 + 0,(1). The proof is then completed by the 
remark that, as in Lemma 3, Ry, = Rg + 0,(1). 

Lemma 5. Subject to assumptions A, each of the random variables —2 log un, 
—n{h(6)}’Rafh(6)] and — n5/RF'S. is asymptotically distributed as x° with r degrees 
of freedom. 

This follows from lemmas 3 and 4 and from the fact that ~/nd is asymptoti- 
cally normally distributed with mean 0 and variance matrix —Rj. 

In consequence of lemma 5, when n is large the natural choices of critical re- 
gions of size a for testing the hypothesis that  ¢ w on the bases (i), (ii) and (iii) 





398 Ss. D. SILVEY 


are C, , C, and C; respectively where 
C; is the set of x on which —2 log uw > k., 
C, is the set of x on which —nfh(6))/Rsfh(6)] > k., and 
C; is the set of x on which —nd/R7F'A > ka. 


Here k, is determined by Pr{xj{, > ka} = a. 

Wald [8] has shown that usually the tests based on the critical regions C; 
and C, have asymptotically the same power. His argument. shows essentially 
that if n is large and @ is not near w, the power of each test is near 1, while if 
6) is near w each of the random variables —2 log u and —n{h(6)]/Rafh(6)] has 
approximately a non-central x’-distribution with the same parameters. We 
now inquire, without going into rigorous mathematical detail, whether this 
type of argument will usually hold when we compare the tests based on the 
critical regions C; and C;. 

We consider first what happens when n is large and 4 is near w. Then as we 
have seen, 6* will usually be near 6) and we suppose that % is near enough w 
to ensure that 6* — @ is near 0, though +/n(@* — 6) may be appreciably dif- 
ferent from 0. In virtue of the remarks made in (4.3) we will then have, in most 
practical situations, 


(5.6) a/n(6 — 0*) ~ n*Py.D log Ln(-, 6), 
(5.7) V/n(i. — 2*) ~ n'Qs,D log Ln(-, 4) 


where ~ denotes approximate equality with probability near 1, for large n. 
; o : 2 

Also since Dz(@*) + Hy»-4* = 0 and since usually Dz(#) = 0 and D'z(®) = 

—B,,, we will have 


(5.8) —B,,(0* — 6) + Hy,r* = 0, 


approximately. Since the distribution of @ does not depend on whether 6 is in w 
or not, it will remain true (see (5.5)) that 


(5.9) /n(6 — &) ~ n Bs, D log Ln(-, 4%). 


Also examination of the details of the proof of lemma 2 shows that the result 
there obtained, namely 


(5.10) —2 log up ~ n(6 — 6)B,, (6 — 6) 


still holds. 
Now from (5.6) and (5.9) we have 


a/n(6 — 6) ~ /n(6* — 6) + n° (Po, — Bo, )D log La(-, 4) 
= /n(0* — @) + n'Qs,Ri,Qs,D log La(-, %) 
~ V/nBi,He,2* + V/nQo,Ro, (4 — 2*), 
5.7). It is not difficult to show that Q,,Rs, = Bs,Hs,, and so 





LAGRANGIAN MULTIPLIER TEST 399 


a/n(6 — 6) ~ ~/nBj,;He,d. Hence, in the usual practical situation, when n 
is large and 4 is near enough w to ensure that 6* — 4 is near 0, we will have 


—2 log » ~ n(6 — 6)’Be,(6 — 6) ~ ni’Hy,Ba Hed. = ni’Rod. ~ 5/RF'S, 


and consequently the tests based on the critical regions C; and C; will have ap- 
proximately the same power in these circumstances. Moreover it is easy to see 
that each of the random variables —2 log u and —n3/R7'4 will then have ap- 
proximately a non-central x’-distribution with r degrees of freedom and param- 
eter —n2*’R7'2*. (Again this argument could clearly be made rigorous by imagin- 
ing 6 to vary with n in such a way that {|@* — 6|| = O(n) and by imposing 
suitable conditions on the functions f and h). 

We now consider the power of the Lagrangian multiplier test when n is large 
and 4 is not near w. Then the asymptotic distribution of 4, will usually be as 
given in Lemma 1. Now, if \* is not near 0, then with a high probability ~/n\ 
will be far from 0 and since normally the matrix — Rg will be positive definite, 
the power of the test based on C; will be near 1. However there is a possibility 
that 4 might be such that the function z has a stationary value at 6*, in which 
case \* = 0. Then —n3’Réi would not necessarily be large with a high prob- 
ability and consequently the power of the test based on C; would not be near | 
for such a 6. But this is a contingency which does not seem likely to arise often 
(the author has been unable to find an example of it) and we may conclude that 
in most practical situations the Lagrangian multiplier test is equivalent, for 
large samples, to the likelihood ratio test. 


6. Singular inforr.ation matrices. As we have said previously the whole 
problem of maximum likelihood estimation is closely bound up with the be- 
havior of the function z. In particular, for unrestricted estimation it is important 
that z should have a maximum turning value in © at 4 , for this condition plays 
an important part in ensuring consistency of 6,(-, 2). Now the demands that 
2(6)) should be a maximum turning value of z in 2 and that By, should be posi- 
tive definite are not unrelated. For it is usually true that z has a stationary 
value at 9%, i.e., that Dz(@) = 0 and also that D*z(@) = —Bp,: these results 
depend only on f being such that we can “differentiate under the integral sign.”’ 
So that if @ is near 4 we will usually have 


(6.1) 2(0) — 2(0) = —43(® — )’Be,(® — %) + O(\\@ — % ||’). 


Hence if By, is not positive definite it may very well happen that 2() is not a 
maximum turning value of z in 2 and much of unrestricted estimation theory 
would then break down. 

However, even if Be, is not positive definite and z(@) is not a maximum turn- 
ing value of z in Q, it may still be the case that if 6 belongs to the subset w of 
©, z(@) is a maximum turning value of z in w so that restricted estimation 
theory may not need drastic revision. And it is of some theoretical interest to 
consider just what revision is necessary in this case. Moreover this problem is of 





400 Ss. D. SILVEY 


practical interest because it often happens that it is natural, either for reasons 
of symmetry or for some other reason, to describe the distribution of a random 
variable in terms of a parameter @ in such a way that neither is Bs, positive defi- 
nite nor is 2(6)) a maximum of z in @. For instance if X has a multinomial dis- 
tribution and describes an experiment in which an individual can fall into any 
one of s classes, it is natural for reasons of symmetry to denote the probabilities 
associated with the different classes by 6;/ >in 6; (4 = 1, 2, ---, 8). The set 
Q of possible parameters is {@ ¢ R*:0; > 0 (4 = 1,2,---,8)}, and it is easy to 
verify that neither is By positive definite for any @ in © nor is z(@)) a maximum 
turning value of z in &. (In this case it is clear that this is so because we have 
set in s-dimensional space a parameter that is really (s — 1)-dimensional). 
However it is obvious that there is no difficulty about restricted estimation in 
the subset of @ in which }>}.; 6; = 1. 

We will now consider what revision is necessary of that part of the foregoing 
theory based on the assumptions A, if we drop the demand that Bs, be positive 
definite (assumption 11A) and replace assumption 6A by the following assump- 
tion 6B, while maintaining the remainder of the assumptions A. 

Assumption 6B. 0 ¢ w and for any other point 6 of w, F(t, 6) ¥ F(t, 6) for 
at least one ¢. Roughly speaking, we may explain the introduction of assumption 
6B as follows. If assumption 6A is not satisfied, the parameter is not identifiable 
in the set Q, i.e., there are different 6’s in 2 which give the same distribution of 
X. However we wish 4 to be identifiable in the subset w, in order that restricted 
estimation may still be possible. Hence we make assumption 6B. 

It is easy to verify that these assumptions imply the existence of a consistent 
estimator 6,(-,w) of %, that for almost any z and sufficiently large n, 6,(x, w) 
with a Lagrangian multiplier \,,(2) satisfies the restricted likelihood equations 
and that 


(6.2) 


By, + o(1) —Hy, + o(1) |{ 6,.(2, 0) — om log Ln(x, 4%) 


—Hy, + 0(1) 0 4, (2) 0 


for almost any x. Now however, since we have dropped the requirement that 
B,, be positive definite and since subsequent theory concerning the asymptotic 
distributions of 6,(-, w), \, and associated random variables makes considerable 
use of the inverse of Bys,, this theory no longer applies. To enable us to replace 
this theory we will now introduce assumption 11B which is associated with 
assumption 6B in the same manner as 11A was shown at the beginning of this 
section to be associated with 6A. This assumption will provide a natural connec- 
tion between properties of the matrix Bs, , the subset w and the facts that 4, 
is identifiable when it is known to belong tow (assumption 6B), but unidenti- 
fiable in Q. 

Assumption 11B. The matrix Hg, is of rank r. The matrix Bs, is of rank s — ¢ 
where ¢ < r. There exists an s X ¢t sub-matrix H, of He, such that Bs, + HH, 
is positive definite. (Without any loss of generality we may assume that H;, is 





LAGRANGIAN MULTIPLIER TEST 


the matrix composed of the first ¢ columns of Hy, and we may write 
H,, = [H: H,)). 


We will now define the set of assumptions B. 


Assumptions B. By assumptions B we will mean the set of assumptions A 
with 6B and 11B replacing 6A and 11A respectively. 

Now subject to assumptions B, if y denotes an s-dimensional random vector 
normally distributed with mean 0 and variance matrix B,, and if we write 6 in 
place of 6,(-, w) and \ in place of \,(-), then from (6.2) we have, as before, 


f Bs, —H,, 6 — 8 y 
(6.3) /n 5 ; ~ > 
— Ha, 0 a 0 


and since +/n Ho,(6 — @) ~ 0 it follows that 


ce +H: Hi —Hy,][6 — % y 
(6.4) Jn , ~ ; 
| —Hy, 0 4 0 


Since Bs, + HH; is positive definite and Hy, is of rank r, the matrix 


B,+H:Hi —H, 
—Hi, 0 


. . . * * * 
is non-singular and we define Ps, , Qs, and Ry, by 


(6.5) ies = E + Hi wi — 
oD ’ = ’ ° 
Q3, Ro, —H,, 0 


We will also define S,, by 


nt 
(6.6) Ss, - —R,, nm ’ 
0 0 


where I, denotes the unit ¢ X ¢ matrix. 


We will now prove two lemmas concerning the distributions of statistics in 
which we are interested. 
Lemma 6. Subject to assumptions B, the vector 


6 — 6 
V/n s 
a 


is asymptotically normally distributed with mean 0 and variance matrix 


P;, Oo 
o uh 





402 8S. D. SILVEY 


Proor. From (6.4) we have, as previously, the result that 


6 — & | 
vil 
A 


is asymptotically normal with mean 0 and variance matrix 
. * + - 
= B,, Pe, Po, Bo, os 
*/ * */ > e 
Qe, Be, P;, Q5, By, Q5, 


Now P}, By, P?, = P3,(Bs, + H:Hi)P3, — P?,H, Hi P%, and, as previously, the 
first term on the right hand side of this equation is Pj, . Also from (6.5) 


P;, Hy, = 0 


and in particular P},H, = 0. It follows that P%, By, P?, = P%, ; and in a similar 
manner it may be shown that P?,Bs,Q3, = 0. We also have 


Q?,By,.Qs, = —Re,— Q:,Hi Hi Qi, , 


and from (6.5) Q3, He, = —I,, so that, in particular, Q;. Hy = —/(I, 0]’. It 
follows that 
, I, 0 
Qi, Be, Qe, = —Re, — = S.,, 
6, Bo, Qe é ee ¢ 


and this completes the proof. 


_ 


LemMa 7. Subject to assumptions B, —ni’RF~'S is asymptotically distributed 
as x’ with r — t degrees of freedom. 

Proor. Since By, + HH; is positive definite and By, is of rank s — ¢, there 
exists a non-singular matrix W such that 


W’(B,, + HHi)W = I, 


i 
W’B,, W = 
0 0 


where A,_; is a diagonal s — ¢ X s — ¢ matrix. Then 


A... 0 
W’H, H; WwW = i oF 
0 0 


and 


and since HH; is of rank t, it follows that A,_, = I,_; and that 


0 0 
W’H, Hi W = 
0 I, 





LAGRANGIAN MULTIPLIER TEST 403 


We now define an s-dimensional random variable m = (m,, m2, --+ , m,) 
by m = W’y. Then m is normally distributed with mean 0 and variance matrix 


L_, 0 
W’B,, W = : 
0 0 


It follows that m,, mz, --- , m,, are independent N(0, 1) random variables, 
while m,_141 = M142 = °° = m, = 0. 
Now from (6.4) we have 


F- ae ' —Hy, ][6 — % Wm 
nm ‘ = Pa ’ 
— Hp, 0 oe 0 
and so 
[w" —WH,, 1[6 — & m 
(6.7) V/n ’ . ~~ 
— Hp, 0 a 0 


Hence 
‘ ~ ] haa + Hy, Hi, —Hy, 
m’m ~ n Z ; j 
4, — Hi, H;, WW’H,, 
i.e., since Hs,(6 — @) ~ 0, 
(6.8) m’m ~ n{é — 6)’By,[6 — 6] + ni’Hs, WW’He, 4 


Now from (6.4) +/n(6 — @) ~ P#,(W’)“m and, as previously, Pj, is of 
rank s — r. Hence asymptotically, when n{6 — 6]’Be,[6 — 4] is expressed as 
a quadratic form in m,, m2, ++: , m,_, its rank is at most s — r. We will now 
show that when ni’He,WW’He,d is expressed as a quadratic form in m , m2, 

, m,_,, its rank is at most r — 1. 
From (6.7) we have, again since H,,(6 — &) ~ 0, 


—H),Wm ~ +/nHs,WW'H,,’. 


p H; Wm 
Hy, Wm = ’ 
H, Wm 


Now 


and, since 


: 0 0 
m’W’H; H; Wm m’ 
0 I, 
we have H:Wm = 0. Hence 


5; 0 
- ia, ww, 3. ~| , 
H 





404 Ss. D. SILVEY 


Since the rank of Hy is at most r — ¢, it follows that, asymptotically, when 
niHs,WW'He,d is expressed as a quadratic form in m,, mz, -+*, M2, its 
rank is at most r — t. Now from (6.8) by applying Cochran’s Theorem (Cramér 
[4]) we have the result that asymptotically n[6 — 6]’By,[6 — 6] and 


niHe,WW'He,d 
are independently distributed as x with s — randr —t degrees of freedom 
respectively. 
The proof of Lemma 7 is completed by the remarks that 


H.,WW’H:, = Ho,(Bs, + HiH:) ‘He, = —Re,’ 


and that Rj,’ ~ R7". 

The results proved in this section, and the methods of proof, make it clear 
how the technique suggested by Aitchison and Silvey [1] for solving the re- 
stricted likelihood equations can usually be adapted, and how the Lagrangian 
multiplier test can usually be applied when the matrix By, is singular and the 
function h is suitable. We will not amplify this point. 


7. Different numbers of observations on several random variables. Experi- 
mental material being what it is, and experimenters being as they are, it is not 
often that the statistician is faced with an estimation problem in the ideal cir- 
cumstances of being given a number of observations on a vector valued random 
variable. The more usual situation confronting him is that he is given n; ob- 
servations on a random variable X, whose probability density function depends 


on 8 parameters @,, 62,--:, 0,,, m2 observations on a random variable X» 
whose probability density function depends on s2 parameters 0,,41, @s,42, °°: , 
6.482, °°°° and m observations on a random variable X; , whose probability 
density function depends on s parameters 6,,40.4-..40-,41,°°*'» 9, Where 
8S = & + & +--+ + s. And he is presented with the problem of deciding 
whether the true parameter 0 = (6), 02, --- , 0) belongs to a set 

w = {[6e2:h(6) = Of, 


Q and h being as before. If nm. = no = +--+ = nm then we may interpret the ob- 
servations as observations on a vector valued random variable and the fore- 
going theory applies. But if the m’s are not all equal we cannot do this, and in 
order to enlarge the sphere of the Lagrangian multiplier test we have to con- 
sider this situation separately. In discussing it we will avoid all mathematical 
detail and will be content to indicate very briefly the modifications necessary 
in the test. 

We will denote by z* a given set of nm; + nme + --- + mm observations on the 
random variables X,;, X.,--- , X,, and log L(2*, 6) will denote the value of 
the log-likelihood function at the point @. Now if 6 ¢w then the same kind of 
argument as we have used before may be used to show that it will usually be 
the case that 6(z*, w) exists, is near 0) when n;, mo, ++: , Me; and n, are all 





LAGRANGIAN MULTIPLIER TEST 405 


large, and with a Lagrangian multiplier \(2*) satisfies the likelihood equations 


D log L(a*, 6) + He = 0 
h(@) = 0. 
We now introduce a matrix N defined by 
mI,, 0 
0 mI 


2 Ase 


0 0 m I, 
The information matrix Bs, is defined in this case by 
B, = —N“[E:D* log L(-, @)], 


where E, denotes expected value when @ is take as the true parameter. Then 
again by the type of argument used previously we may show that for most 
a*, when mn, %,°*: , m% are large, 


NB,, —Hy, |[ 6(z*,w) — D log L(2x*, 6) 


(7.1) 5 - 
—H,, 0 .(x*) 0 


Also it will usually be true that D log L(2*, 6)) can be regarded as an observa- 
tion on a random variable which is approximately normal with mean 0 and 
variance matrix NB, . 

Now in the case where Bs, is positive definite we may use (7.1) in the same 
way as before to show that when 6 € w and mn, m2, --- , m are large, 


5/Hi(NBs) “Hs. 


will usually be distributed approximately as x’ with r degrees of freedom, and 
it is this statistic which we use in the modified form of the Lagrangian multi- 
plier test. Alternatively when By, is of rank s — t, when each of the functions 
hi, he, --+ , hy is a function of only the parameters involved in the distribution 
of one of the X’s and By, + HH; is positive definite, the statistic on which 
the test is based is 4/Hé[N(Bs + H,H:)| “Hs, which will usually be distributed 
as x with r — t degrees of freedom when nm, nz, ++: , m% are large. 

We conclude by applying the Lagrangian multiplier test in a familiar situa- 
tion. 

Homogeneity in the 2 X 2 contingency table. One of the three situa- 
tions (Cochran [2]) in which the 2 X 2 contingency table arises is as follows. 
We are given n, observations on a random variable X, whose distribution is 
defined by 

Pri X, = (1,0)} = 63/(6; + 62), 


Pri X, = (0, 1)} = 62/(6: + 02), 





406 S. D. SILVEY 


and nz observations on an independent random variable X, whose distribution 
is defined similarly in terms of 63 and 6{. These observations can be summarised 
in a 2 X 2 contingency table as follows. 

Number of occurrences of different values of X; and Xe. 





(1, 0) (0,1) | ‘Total 
since lainll eibielitchaaitacasatapeeaagietantrae dl tnicnicciieaprerss 
».€ ny Tye ny 
Xo | na Toe | Ne 
Total | m, Me n 
We suppose that the point 4% :, 92, 83, 04) is known to belong to the set 


= (0 
Q = {Oe Re S 0; S 1/e (i = 1, 2, 
In this case we also have 


3, 4)} where e€ is a small positive number. 


log L(x*, 6) = constant + ny log 6 + nie log 6 — m log (#1 + 4) 
+ nx log 6; + nee log 04 — ne log (A; + 44). 


The matrix 


Or’ — (0,+6) —(0 + &)~ 0 0 
—(0, + 02) 6'—(0+6.)~ 0 0 
we 0 0 6;'—(0:+6)° —(0s; + &)~ 
0 0 —(0; +0) 65 — (03 + 0) 


has rank 2. Homogeneity of X, and X2 means that 63/(6: + 62) = 6 
and we consider estimating 4 subject to the restrictions 
+ %— 1 


h(6) = 63; + 0, — 1 = 0, 


(03 + 6%) 


6, — 4s 
so that 
1 O I 
“i ‘628 
oo aut | ee ee 
0 1 0 


If H;, is the leading 4 X 2 sub-matrix of Hy , then for any 6 ¢ w, 
88) 2 
0 6 0 O 
0 0 6° O 


—1 


0 0 QO % 
which is positive definite. 





LAGRANGIAN MULTIPLIER TEST 


The likelihood equations are easily solved in this case and we find that 


6,(2*, w) = 6;(2*, w) = m/n 


while 6:(2*, w) = &(2*, w) = me/n. It is not difficult to verify that the statistic 
5/H4(N(Bs + H.H:)) "Ha is the usual statistic used in the x’-test of homo- 
geneity in a 2 X 2 table, so that this test is a particular case of the Lagrangian 
multiplier test. And it illustrates most aspects of the preceding theory. The 
computational procedure for applying the Lagrangian multiplier test in less 
familiar and more complicated situations will be set out in a subsequent paper. 


REFERENCES 
Arrcnison AND 8. D. Sttvey, ‘‘Maximum likelihood estimation of parameters sub- 
ject to restraints,’’ Ann. Math. Stat., Vol. 29 (1958), pp. 813-828. 
’. G. Cocuran, ‘‘The x?-test of goodness of fit,’? Ann. Math. Stat., Vol. 23 (1952), pp. 
315-345 
. Cram&r, Random Variables and Probability Distributions, Cambridge University 
Press, 1937. 
. Cramér, Mathematical Methods of Statistics, Princeton University Press, 1946. 
C. Krart anp L. Lecam, ‘‘A remark on the roots of the maximum likelihood equation,”’ 
Ann. Math. Stat., Vol. 27 (1956), pp. 1174-1177. 
. B. MANN anv A. Waxp, ‘“‘On stochastic limit and order relationships,’’ Ann. Math 
Stat., Vol. 14 (1943), pp. 217-226. 
. WaLp, ‘‘Note on the consistency of the maximum likelihood estimate,’’ Ann. Math. 
Stat., Vol. 20 (1949), pp. 595-601. 
. WaLp, “Tests of statistical hypotheses concerning several parameters when the 
number of observations is large,’’ Trans. Am. Math. Soc., Vol. 54 (1943), pp. 426- 


482. 
















































INTUITIVE PROBABILITY ON FINITE SETS! 
By CuHarues H. Krart, Joun W. Pratt, anp A. SEIDENBERG 


Michigan State University, University of Chicago and Harvard University, 
and University of California (Berkeley) 


1. Introduction. Let x, ,--- , z, be the distinct elements of a set S. By as- 
signing nonnegative numbers v(x;) to the z; and v(z;,) + --- + v(z;,) to the 
set {z:,, °°, %:,}, we obtain an ordering of the subsets of S, namely, the sub- 
sets are ordered in accordance with the values as just assigned.” We denote 
by v(a) the value assigned to a, and write a < 8 if v(a@) S v(8). For this order- 
ing the following conditions obtain: 

Comparability (C): For any a, 8, a < 8 or B < a@ (or both). 

Transitivity (T): a < Band B < y implies a < y 

Additivity (A): Let y be disjoint from a, 8; then a < 8 if and only if 


aUy < BUY. 


Also @ < vy for every y, where ¢ is the empty set. 

Let T be the set of subsets of S. We shall say that an ordering of T obtained 
by assigning values to the x; arises from a measure. Conversely, B. de Finetti 
[1] (see also [4], p. 40) has asked whether every ordering of T subject to the 
above conditions arises from a measure; and moreover has conjectured that it 
does; but we show by a counter-example that the conjecture is false for n = 5. 
In Theorem 2 we give a necessary and sufficient condition that an ordering 
arises from a measure; the proof includes a procedure for checking in a finite 
number of steps whether the condition obtains. 

The connection with intuitive probability (i.e., the axiomatic theory of prob- 
ability) is as follows: one has n incompatible events 2 , --- , 2, ; and one sup- 
poses that one can confront the disjunction of any subset of them with the 
disjunction of any other, being able to say (or judge) whether they are equally 
likely, and if not, which is the more likely. Thus one has a transitive ordering 
of 7’; moreover, this ordering is subject to the additivity condition (and, if one 
likes, to any further conditions similar to the above which obtain for an order- 
ing arising from a measure). The question then is whether one can assign a 
numerical probability to the event x; in such a way that the corresponding 
ordering of 7 coincides with the given ordering; or in other words, whether 
there exists a strictly agreeing measure. As said, the answer is no. 


Received March 14, 1958; revised November 22, 1958. 

! Prepared with partial support of the Office of Naval Research to the first two named 
authors. This paper may be reproduced in whole or in part for any purpose of the United 
States Government. 

2 By an ordering of a set S we mean an arbitrary, possibly empty, subset of the Cartesian 
product S X< S, that is, an arbitrary set of ordered pairs (a, b) with a, b elements of S. 
If (a, b) is such a pair, we write a < b. An ordering is sometimes also called a relation. 


408 





INTUITIVE PROBABILITY ON FINITE SETS 409 


In ({1] Section 3, p. 3), de Finetti suggests that if the answer to his conjec- 
ture should be no, then this is because the “right”? axioms haven’t been put 
down. In Theorem 5, we show that if we subject our judgment to certain con- 
ditions of the same general character as (C), (T), and (A), then we will, in 
fact, reject any ordering which does not arise from a measure. The counter- 
example is thus only a partial answer to de Finetti’s conjecture; and Theorem 
5 completes the answer. 

The question of almost agreeing measures (see definition below) is also taken 
up. A counter-example is given to show that an ordering can be subject to (C), 
(T), (A) without having any almost agreeing measure. 

For a systematic treatment of intuitive probability see [4] and the literature 
there cited, in particular, [3]. 


2. Preliminaries. We designate the subsets of S multiplicatively: thus 2,727; , 
for example, is the set consisting of the elements x; , 22, z;. The empty set is 
designated by 1. The set 7’ of subsets of S is thus identified with the monomials 
in n indeterminates x; , --- , 2, in which the exponents are 0 or 1. In the stand- 
ard terminology for polynomials, the intersection 6 of two sets a, 8 is their 
greatest common divisor. The product a8 need not be in 7’, in fact will be in 
T if and only if 6 = 1. The union of two sets a, 8 is a8/6. 

In addition to the monomials in 7, it is convenient to consider the group G 
of monomials zj', --- , 24", t,, «-+ , ¢, arbitrary integers; and extend the meas- 
ure v on T to G in such a way that v(a8) = v(a@) + v(8). There is, in fact, one 
and only one way to make this extension, namely, by placing v(zj', --- , 3") = 
iw(a,) +--+ + t,0(2,). We will call a mapping a — v(a) of G into the addi- 
tive group of real numbers for which v(a8) = v(a) + v(8) a valuation. There 
is thus a 1-1 correspondence between measures and the valuations in which 
v(x;) 2 O for every 7, and such valuations could, without great confusion, be 
called measures. 

Let v be a valuation of G, corresponding to a measure, and giving rise to an 
ordering of T. In addition to the conditions (C), (T), (A), there are several 
other obvious conditions that one can write down. For example: if a < 6 and 
7 < 6, then ay < £6. Here, even if a, B, y, 6 are in 7’, ay and 86 need not be. 
In order to confine ourselves to T', we consider the case that a, 8, y, 6 are in 7 
and there exists a monomial e such that ay/e and 85/e are in 7. The question 
then is whether (C), (T), (A) imply that ay/e < 86/e. A priori either this 
implication can be established in a purely formal way, or it cannot, and if it 
cannot, the question is whether intuition requires the conclusion ay/e < 86/e. 
For the time being, we need not enter into considerations of the latter kind, as 
we have the following theorem. We write a < 6 if a < 8 obtains but 6 < a 
does not obtain. 

THEOREM 1. On T let there be a relation (<) subject to the conditions (T), 
(A). If a, B, y, 6 are in T and there is a monomial ¢€ such that ay/«e and B6/€« are 
in T, thena < Bandy < 6 implies ay/e < Bi/e. If in addition a < B ory < 4, 
then ay/e < B6/e. 





410 C. H. KRAFT, J. W. PRATT, AND A. SEIDENBERG 


Proor. First suppose a, 8 have greatest common divisor 1 and y, 6 have 
greatest common divisor 1. Then also g.c.d. (a, y) = 1 and g.c.d. (6, 6) = 1. 
For if, say, x; were a factor of a and y, then it would not be a factor of 8 or 4, 
and there could exist no « such that ay/e and 86/e would be in 7; similarly 
with 8, 6. Writing 


a = a’b,, y=7n, B= Bn, 6 = 56,, 
v1 = g.c.d. (8, 7), 5, = g.c.d. (a, 5), 


one finds y’a < 7/8 = B’y < Bb and ya’ < 8’8’, from which ay/e < 6/e fol- 
lows. In the general case, let X = g.c.d. (a, 8), uw = g.c.d. (y, 6). Then a/A < 
B/d, y/u < 5/p, by additivity; also (a@/d)(y/u)/(e/Auw) and (B/d)(5/u)/(€/An) 
are in T. By the first part of the proof, we now have (a/A)(y/u)/(e/Au) < 
(B/X)(6/pu)/(€/Aw), that is, ay/e < Bi/e. 

For the second part of the theorem, keeping in mind that A < yu, uw < v and 
v < \ implies uw < A, v < uw, A < », and assuming §’5 < y’a, we obtain y’B < 
y'a, B56 < B’y, whence 8 < aandé < y. Thus if a < Bory < 6, theny’a < 
6’6; and ay/e < 85/e follows. 

We shall have occasion to refer to the following condition: 

Generalized Additivity (GA): If a: < 8;,i = 1,---, 8, and [Ja;, [JA are 
in T, then Tle: < Ils; . If in addition a; < 8; for some i, then Ile: < []é:. 

COROLLARY TO THEOREM 1. Let T be ordered by a relation subject to the condi- 
tions (T), (A). Then (GA) also obtains. 

The proof is by induction on s. On the other hand, if one drops the assump- 


tion that Tle; : IIs. are in T and assumes only that there is a monomial ¢ such 
that [lai/«, TI6/« are in 7, then (even assuming (C)) one cannot conclude, 
as we shall see from the counter-example below, that Tle, e< I1s./«. 


3. Agreeing and almost agreeing measures. Let 7’; be an arbitrary set of 
monomials, with exponents possibly negative, and let <, < be two completely 
arbitrary order relations on 7,. Of these relations individually taken we as- 
sume nothing, not even transitivity; in other words, we have given, for < say, 
a set R of pairs: R = {(a, B)| a, 8B e€ Ti}, and we write a < 8 if (a, 8) e R; sim- 
ilarly for < there is a set of pairs S. Although S C R need not be assumed 
for what follows, for slight notational conveniences which will involve prac- 
tically no loss of generality, we assume that a < 6 implies a < 8. We refer to 
the completely arbitrary ordering (<, <), and say it arises from the valuation 
v if a < B implies v(a) S v(8) and a < 8B implies v(a) < v(8). 

Let us write (for arbitrary monomials a, 8B) a < B if a = Ile: b= IIs; ; 
a;,B:eT,, a; < B;,i = 1,---, 8; anda < 8 if in addition a; < £8; for at 
least one i. If the given ordering arises from a valuation, then clearly « < « 
for no e. 


*’ Below we shall have qs < p, pg < rs, ps < tg, but not spq = (qs)(pq)(ps)/spq < 
(p)(rs)(tq)/spq = rt. 





INTUITIVE PROBABILITY ON FINITE SETS 411 


DEFINITION. A completely arbitrary ordering of T, will be said to be compatible 
with a valuation if « < «€ holds for no «. (In terms of the originally given relations, 
the condition « < «€ holds for some « can be expressed as follows: there exists a rela- 
tion I1(4;/a:) = 1, with a;, B; in T, a; < B; each i, and a; < 8; for at least 
one 1. ) 

THEOREM 2. A completely arbitrary ordering of T,, an arbitrary finite set of 
monomials, arises from a valuation if (and, trivially, only if) it is compatible with 
a valuation. 

For the proof it will be convenient to separate out the following lemma. 

Lemma 0. (a). Given an arbitrary finite system of linear equalities and inequali- 
ties {l; > 0, U = 0, lf = O}, where the 1; , i , Ut are linear forms in indeterminates 
ti,°***, Xn with rational coefficients, one has an algorithm for deciding whether 
the system has a solution, and if it does, for finding one. 

(b). The system {l; > 0, l; = 0, if = 0} of (a) has a solution if 
(and, trivially, only if) the following hypothesis obtains: 

(H): for no rational \; 2 0, uj, % 2 0, A > O for at least one i, 
does the linear form L = >> dil; + >> ujl, + >> »l? equal zero (that is, have all 
its coefficients equal zero).* 

ProoF OF THE Lemma. The idea of the proof of (b) is as follows: each step 
of the algorithm of (a) leads to a finite number of other systems of similar form, 
the disjunction of which is equivalent with the given system; moreover, the 
hypothesis (H) carries over, at each step, to at least one of the resulting sys- 
tems. Ultimately the indeterminates x; ,--- , x, are eliminated, and (b) fol- 
lows by verifying it, as one does trivially, in the case that there are no 7; . 

As for the proof itself: if an inequality [7 = 0 occurs, we can write the system 
as the disjunction of the following two systems: 


(1): {4 >0,% >0;=0;2 = 


and (2): {i >0;l) =0,% =0;8 20,---} 


s- 


One sees without difficulty that the hypothesis (H) carries over to at least one 

of these two systems.” Therefore we may suppose all the inequalities (and 
aie , . , 

equalities) to be of the form /; > 0 or 1; = 0. If now an equality /; = 0 occurs 


‘If in (H) we had the word real instead of rational, this would follow directly from 
({2], p. 26, Criterion III); moreover, by Corollary 2 below, a system of linear inequalities 
with rational coefficients which has a real solution must also have a rational solution; 
and the theorem follows. Since theorems on linear inequalities are linked in an intimate way 
with facts about convex sets (see [2]), a knowledge of these facts renders the theorems trans- 
parent; but the fact is that in taking care of the additional point just mentioned, one can 
by-pass entirely the consideration of convex sets. With slight modifications, our proof of 
Theorem 2 yields quite simple proofs of all the theorems on inequalities given in ({2], pp. 
23-28); in this connection, see Theorem 3, below. 

5 If it didn’t, we would have an identity of the form Daj ik + Duds + DS vali = 0, 
4; 2 0, = 0,and », > 0,say » = 1; and another such identity with; 2 0,some \; > 0, 
vy, = 0 for k = 2, and », < 0, say » = —1. Adding the two identities gives an identity 
contradicting the hypothesis (#). 





412 C. H. KRAFT, J. W. PRATT, AND A. SEIDENBERG 


and actually involves some letter x, , we can use this relation to eliminate 2, . 
It is immediate that the hypothesis (H) carries over to the resulting system. 
Hence we may suppose only inequalities of the form 1; > 0 to occur. The sys- 
tem being of the form {/; > 0}, we write it, relative to some 2, that actually 
occurs, in the form {m, — x, > 0, 1 — m, > 0, m’, > 0}, where the m, , ms, 
m., are forms in 22, --*, Xn. Necessary and sufficient for this system to have 
a solution is that the system {m, — m, > 0, m, > 0} have a solution: in fact, 
if Z,--:, Z is a solution of this system, then min m,(Z) > max m.(#); and 
taking #, arbitrarily between these numbers we get a solution Z,,--- , , of 
the original system. Moreover the hypothesis (H) carries over to the system 
in %,°**, 2, a8 one easily sees. Hence the proof is complete by induction, 
subject to the verification for n = 0. 

Proor or THEOREM 2. The theorem is seen to be a corollary of the lemma 
upon rewriting the theorem in additive form. If, namely, in any valuation, 
x; gets the value Z; , then [ [27 gets the value > r@;. Leta = I[z7', B = 
[[2}’. Then a < B yields >> (s; — rj)#; = 0; a < B yields >, (s; — r;)%; > 0. 
Corresponding to the power product 8/a, consider the linear form 1 = 
pi (s; — r;)2,; (in indeterminates x;). Let {l;{ be the set of linear forms arising 
from B/a with a < 8; {lt}, the set of linear forms arising from B/a with a < 8. 
The assertion that the ordering arises from a valuation thus comes to saying 
that the system {/; > 0, if = 0} has a solution. A condition I1(6,/ae) = | 
rewritten in additive form becomes: >> 1, = 0, that is, the linear form L = 
> J, has all its coefficients equal to zero. The compatibility condition can then 
be stated as follows: for no integral A; 2 0, %» 2 0, A; > O for at least on 7, 
does the linear form L = >> dd; + >> vlf equal zero (here, if L corresponds to 
IIs, /a,), \; counts the number of times a 8;/a; with a; < 8; occurs; and » , 
the number of times a 6,/a, with au < £, occurs). Moreover, since the coeffi- 
cients of L are homogeneous in the \;, ™ , the compatibility hypothesis can 
also be stated as follows: for no rational \; = 0, % = 0, A; > O for at least one 
i, does L = 0. This is just hypothesis (H) of the lemma, so the system has a 
solution, and the desired valuation exists. 

As corollaries of the lemma, we have the following. 

Coro.LLaRy 1. Given an arbitrary ordering of T;, an arbitrary finite set of 
monomials in x, °** , tn, one has an algorithm for deciding whether the ordering 
arises from a valuation, and if it does, for finding one. The number N of steps 
needed is a simple (in fact, primitive recursive) function of n and b, where b is a 
bound on the exponents of the x; . 

The algorithm applies to a system over an arbitrary ordered field. Moreover 
one gets the following useful corollary. 

Coro.uuary 2. If a finite system of linear equalities and inequalities with co- 
efficients in an ordered field F has a solution in an ordered extension field G of F, 
then it also has a solution in F. 

Proor. The algorithm for deciding relative to G is identical with that rela- 
tive to F. 





INTUITIVE PROBABILITY ON FINITE SETS 413 


Given a linear system of equalities and inequalities with rational coefficients, 
let B be a bound on the (absolute values of the) numerators and denominators 
of the coefficients when written as some quotients of integers. Following the 
above algorithm, one sees that 2B‘ is a similar bound for the system obtained 
upon eliminating x, . Hence one sees how to write down a simple function of 
B and n which will be a bound for possible numerators and denominators of 
some solution (if there are solutions). If the equalities and inequalities are 
homogeneous, then there is an integral solution, and one has a bound for one 
such. Now write « < ¢ for some ¢ in the form [I] (6i/a;)" = lwitha,;,6;¢7;. 
B;/a; # B;/a; for 7 # 1, a; < B; every i, a; < B; some i, r; 2 0, r; > O fo. at 
least one 7 with a; < 8;. Writing out the a;, 8; as monomials in the zx; and 
comparing coefficients, one obtains a system of homogeneous linear conditions 
on the r;. If the system has a solution, then it has one with the r,; integral and 
bounded as just explained. Hence we have the following corollary. 

Corotuary 3. Let T, be an arbitrary set of monomials in n variables with 
exponents bounded by b. For every n and b one can find an N such that an arbi- 
trary ordering of T; arises from a valuation if and only if there is no relation of 
the form [](6;/a:)"' = 1,0 < r; S N, and some r; ¥ 0 for an i such that a; < B;. 
Here N is a simple (in fact, primitive recursive) function of n and b. (For T; = T 
the bound will depend only on n.)° 

In a general axiomatic theory of probability it would undoubtedly be of 
significance to let the values or measures be elements of an arbitrary simply 
ordered group, because such groups »re capable of accommodating events p, q¢ 
with p more probable than g but only by an infinitely small amount. For finite 
sets, however, one has the following corollary. 

Coro.uary 4. If an ordering of T; arises by assigning values to the x; from a 
simply ordered group, then the ordering can also be obtained by assigning real 
numbers to the x;. 

Proor. If the ordering arises as assumed, then the condition of the theorem 
obviously obtains. 

DerFINITION. By an almost agreeing valuation one means a valuation, other 
than the one for which v(z;) = 0 for every 7, such that a < 8 implies v(a) S$ 
v(B). In the case v(x;) 2 O every 7, we speak of an almost agreeing measure. 

THEOREM 3. Let T, be an arbitrary finite set of monomials containing 1, 2 , 

- , a, and ordered arbitrarily subject to the conditions 1 < 2,,---,1 < an. 
Then the ordering admits an almost agreeing measure if and only if no monomial 
[I] (8:/a;), a, Bie T1, a < B;, has all its exponents negative.’ 

Proor. This time (see Theorem 2, proof) we have a system {/; 2 0} for 
which there is to be a solution; the hypothesis is that for no rational A, , A; 2 
0, some A; > 0, does the linear form >> djl; have all its coefficients negative. 
Taking into account Corollary 2 above, this follows directly from ([2] p. 27 


af, 


6 For 7, = T, a more special analysis shows that N = n!isa suitable bound. The 8;/a, 
can be taken to be S n + 1 in number. 


7 Finiteness conditions hold here as in Theorem 2 and corollaries. 





414 C. H. KRAFT, J. W. PRATT, AND A. SEIDENBERG 


Criterion V1). A short self-contained proof can be given as follows. Write the 
given system {l; 2 0} relative to some variable x, which occurs in the form 
{m, — 2; = 0, 1; — m, = 0, ms = O}. Then the hypothesis does not carry over 
to the system {m, — m, = 0, m = 0}. However, if the elimination is likewise 
carried out relative to a second variable x. which occurs, then one sees that 
the hypothesis carries over to at least one of the resulting systems. Hence the 
induction holds, and the theorem follows upon verification for n = 1 (and 
n=0). 


4. The counter-examples. To facilitate the exposition, we state the following 
proposition and theorem, but postpone the proofs for a moment. 

Proposition 1. Jn a simple ordering of the subsets of S = {x , --- , Xn} which 
satisfies additivity, the last 2""' subsets are the complements of the first 2”~ in 
reverse order. 

TuHEeoreM 4. Let the 2” subsets of S = {x,,--++, Xn} be simply ordered and 
assume that the last 2"”' subsets are the complements of the first 2"~' in reverse 
order. Let U be the first 2""' + 1 subsets and assume that 1, the empty set, is the 
first element of U and that a8 ¢ U implies a, B e U. Then if additivity holds for U 
(t.e., tf ay < By implies a < 8B for all ay, By in U), it also holds for the whole 
ordering of the 2" subsets T. 

The first counter-example stems from trying to see whether Theorem 1 can 
be extended to three inequalities (in five letters, the fewest for which the ex- 
tension can fail). One has to put down three inequalities such that all three, 
but no two, lead (as in Theorem 1) to a new relation; say 


qs < PD, pq < rs, ps < tq. 
In any agreeing measure one would have to have pgs < rt, so we put down 
rt < pqs 


and try to fit these four inequalities into a simple ordering of the 32 subsets 
of {p, q, r, 8, t} which satisfies additivity. Starting with 1 <q<p<r<s< 
qr < qs < rs < qrs, which obviously satisfies additivity, we adjoin the relations 
gs < Pp, pq < rs to get 


L“6<f <4. 4.< <p Se < re 
(the complements of which, in {p, q, r, s}, in reverse order are 
pq < rs < qrs < pr < ps < pqr < pgs < prs < pqrs). 
Additivity clearly holds for these first 9 subsets, hence also for all 16 by The- 
orem 4. 

Since rt and pgs are complements, rt will have to be among the first 16 of the 
sought example; hence also gt and t. On the other hand pqs is 14th in the above 
ordering of the subsets of {p, q, r, s}. Hence we try to adjoin t < gt < rt to the 
13 sets preceding pgs. It is convenient to try to take rt as the 16th element, 
as then pgs will be the 17th and no new elements enter into consideration. 





INTUITIVE PROBABILITY ON FINITE SETS 415 


Placing rt 16th, from pgr < rt one gets the requirement pq < ¢; and from tg < 
pqs one gets t < ps. Since ps < tq, we must place tq either directly before pqr 
or directly after it. Placing tg < pqr, from grs < tq < pgr one gets the require- 
ments rs < t < pr. Now all requirements for additivity have been found. In 
fact, consider the ordering 


lL G@a<4r.<a ce <os.9 <i 


<rs<t<qrs < rp < ps < tq < qrp < rt < spq 
(and then by complements ) 


spq < st < rsp < qrt < qst < pt < qrsp < qpt < rst 
< qrst < rpt < spt < gqrpt < qspt < rspt < pgrst. 


In checking additivity one has to see that cancelation with an element involving 
t preserves order. As far as canceling ¢ is concerned, this checks upon observing 
that ¢, tg, tr are in correct order. As for canceling g, one has only to consider 
the elements adjacent to tq which involve gq, namely grs and gqrp; this gives 
rs < t < rp, which checks, and moreover was checked in the course of the con- 
struction. Similarly grp < rt yields gp < t, which checks. Of course one can 
check directly that the above ordering gives the desired counter-example, with- 
out recourse to Proposition 1 or Theorem 4, or Theorems 1, 2, and 3 for that 
matter. 

One can also obtain a counter-example as follows. While the given inequali- 


ties have no strictly agreeing measure, they do have almost agreeing ones, and 
from one such one can easily obtain an additive ordering. In fact, let P, Q, R, 
S, T be the values in an almost agreeing measure. Then from 


Q+SasP 
P+QsSR+S 
P+SsSQ+T 
R+T<SP4+Q4+S 


and the fact that (Q + S) +---+(R+T7) =P+---+(P+Q+4+ 8S) 
one findsQ + S=P,P+Q=R4+S8,P+ S8=Q4+T7,R+T=P+ 
Q + S; from which R = 2Q, P = Q + S, T = 2S; and these conditions are 
sufficient. Taking Q and S so that p, q, r, s, ¢ get distinct values (say Q = 1, 
S = 3;R = 2, P = 4, T = 6), one sees that no element other than rt and pqs 
gets the value v(rt) = v(pgs). Keeping R and T fixed but decreasing Q, P, S 
slightly (say by .1 to Q = .9, S = 2.9, P = 3.9), we get a measure in which 
qs < p, pg < rs, ps < qt, gps < rt and in which 15 elements have value less 
than v(pqs) and 15 have value greater than v(rt). Now we change P, Q, R, S, 
T slightly so that the 32 elements get distinct values, the inequalities gs < p, 
pq < rs, ps < qt, gps < rt are maintained, and also so that gps and rt remain 
in the middle (say by taking S = 2.89, T = 5.9, R = 2.2, keeping Q = 29, 





416 C. H. KRAFT, J. W. PRATT, AND A. SEIDENBERG 


P = 3.9). The resulting order is additive, since it has a strictly agreeing meas- 
ure: but then so is the order one gets by interchanging the middle elements. 
In this way one gets an example of the desired kind (and, in fact, with the 
stated values, the above example )." 

We now give a counter-example showing that the order on the subsets of 
{q, Tr, 8, p, t} given above can be extended to the subsets of {g, r, s, p, t, w} in 
such a way that the resulting order, though it satisfies (C), (T), (A), does 
not almost agree with any measure. 

As already noted, the given order has an almost agreeing measure (e.g., 
Q=1,R = 2,8 = 3, P = 4, T = 6). Hence in the desired counter-example, 
w would have to be amongst the first 32, otherwise it would be 33rd and any 
value W of w equal to or greater than v(pqrst) = 16 would yield an almost 
agreeing measure. A similar argument shows that at least one other element 
involving w, hence gw, must be amongst the first 32. The 30th element in the 
above order is gspt. Placing this 32nd, so that rw is 33rd, whence grpt < rw, 
we get gpt < w. Now inserting w < qw between two elements which must have 
equal value in any almost agreeing measure, say between gqrst and rpt, the re- 
sulting order can have no almost agreeing measure. In fact, if P, --- , W are 
the proposed values, then from W = Q + W, we get Q = 0, hence R = 0 (from 
R = 2Q), S = 0 (from s < qr), T = 0 (from T = 28S), P = 0 (from P = 
Q + S), and W = 0 (from w < pgrst). Hence there can be no almost agreeing 
measure. The order of the first 33 subsets now is: 


l<q<-:: < gqrst < w < qu < rpt < spt < qrpt < qspt < rw<:-::. 


To check additivity it remains to see that order is preserved upon canceling q 
in grst < qw < grpt; since rst < w < rpt, this checks. 

In accordance with Theorem 3 one finds that 

(rs/qp)"*(qt/sp)**(qsp/rt)”(w/grst)”(rpt/qw)” (rw/qspt)* 
has all its exponents negative. 

Proor OF Proposition 1. Let a < 8 and write a = ayy, 8 = Bry, where 
y = g.c.d. (a, 8). Let 6 = complement of a)f;y. Then comp. ayy = 66 and 
comp. Bry = a6. Since a, < 8; , we have comp. Sry < comp. ayy. Thus comple- 
mentation reverses order. The proposition would now follow unless some two 
elements among the first 2", say a, 8, were complements; and likewise unless 
some two elements among the last 2””", say y, 6, were complements. Suppose 
then these conditions obtain with a < B < y < 6. Froma < y, 8B < 6 we get 
by Theorem 1 that a8 < 6, contradiction, since a8 = 6. 

Proor oF THEOREM 4. First note that a < ay for any ay in T, y ¥ 1. In 
fact if ay is in the first half of T, then a is also, and a < ay follows from addi- 
tivity. If ay is in the second half and a is also, the assertion follows upon taking 





8 The advantage of the second method is that it leaves unanswered the question whether 
a simple, additive ordering of 7 necessarily has an almost agreeing measure. Especially it 
leaves it unanswered for n = 5. 





INTUITIVE PROBABILITY ON FINITE SETS 417 


complements; it follows trivially if ay is in the second half and a is in the first 
half. Let then a, 8, ay, By e T, a < 8; and assume By < ay, soy ¥ landa < 
B < By < ay. We may assume that £ is in the first half of T, as otherwise we 
can reduce to this case by taking complements. If now fy is in the second half, 
then a/8 = ay/By = complement By/complement ay = u/d, where p, A are in 
the first half and \ < yu. This contradicts (A) of U. Hence we may assume that 
By is in the first half. 

We also suppose ay is in the second half and in fact not the (2”' + 1)th 
element v, otherwise we already have a contradiction. Also, by displacing g.c.d. 
(a, 8) to y we may assume g.c.d. (a, 8) = 1. Assuming this done, let 6 be the 
complement of afy. Let a: , 8:1, 71, 5: be the g.c.d.’s of a, 8, y, 5 respectively 
with v; and ae, Be, y2, 52 their complements. Then we have: 


aa < BiBe < BiBryrye S aBryebe < anBiyidi = 0 < mazyry2 
and, by complements, 8)825;52 < aByyob. < Syd: = v. Writing 
a/B = ay/By = (ay/v)-(v/By) 
and preparing to get a contradiction by following the proof of Theorem 1, we 
note from the first line, by an allowable cancellation, that Byy2 < ad; ; and 
from the second 8,6; < eye. Also BiByy2 < f:a;6, from the first line; and we 
want 01816; < ajay li Lely 2 s 01816; (s v), then arvy2 4 B18; 9 contradiction. 
So we have 6,8ry2 < aay. Notationally, this means we can assume y; = 1; 
and from the symmetry of the situation (i.e., the fact that hypothesis and con- 


clusion are unaltered by interchange of v and its complement), that y. = 1, 
contradiction. Or explicitly, we have 


aa. < BiB, < BiBoye S a2 Bev2b2 < ai Biyids = v 


(and v < a a2‘y2, a8 otherwise we already have a contradiction), and, by comple- 
ments 


Bi 82716182 S a2 Be72b2 < a1 81718) = v. 


Now we get 8: S ads, Bode < a1, whence B28; S Beard. < ayaz , contradiction. 


5. Axiomatic considerations. We are concerned now with putting down 
axiomatic conditions on an ordering of the subsets S which will make the or- 
dering compatible with a valuation. The axioms will refer to a set T’ which is 
axiomatically left undefined but which intuitively arises by a simple construc- 
tion from S. For convenience the set 7’ will be infinite, though actually only a 
finite portion of 7’ is involved.’ In fact, for a moment it may be helpful to 
think of T’ as the set of all events. Because of this, or because of the construc- 
tion, one will compare some pairs of elements from 7’, but not all pairs. That 
is, for T’ we will not assume Comparability, though we will assume Transitivity 


® A bound on the number of elements of a suitable 7’ can be computed using Theorem 2, 
Corollary 3. 





418 C. H. KRAFT, J. W. PRATT, AND A. SEIDENBERG 


and Additivity; we would allow Generalized Additivity, but can (and will) man- 
age without it. In addition we take the following axiom: 

Polarizability (P). For any @ in T’ there exist elements a’, a” in T’ with 
a = a’a”, a’ < a”, a” < a’. 

Though we do not assume comparability, we will assume the following: 

P-Comparability (PC). If a = a’a”, a’ < a”, a” < a’, B = BB", B’ < B", 
B” < B’, anda < 8B, thena’ < 8’. 

The intuitive content of (P) is that any event a is equivalent to the dis- 
junction of two incompatible and equally likely events a’, a”. By “‘equivalent”’ 
we mean that a@ occurs if and only if a’ or a” occurs. The content of this axiom 
is intuitively quite compelling. If a is any event (outcome), we can compose 
it with an irrelevant event having just two incompatible and equally likely 
outcomes, say, for example, the tossing of a coin. Let a* be this composite event. 
Then a and a* have essentially the same significance. Now let a’ be composed 
of a and the outcome heads and let a” be composed of a and the outcome tails. 
Then a* = a’a’, a’ < a”, a” < a’. 

Starting with the set S = {x,,---, Xn}, we polarize its elements, i.e., we 
apply (P) to them, then polarize the results, etc. Call the resulting set S’. The 
set 7” is obtained by composing elements of S’ which do not overlap in content. 
It is clear that we shall want transitivity, additivity, and even generalized 
additivity in 7’; moreover it is also clear that we shall not want to assume 
comparability. For we might be quite w illing to compare 2 and 2 in likelihood, 
and yet be quite unwilling to compare az; and xz, where z; is composed of 2; 
and the tossing of coins and 2 is composed from 2. and the tossing of coins. 
In fact, such comparisons would amount to attaching precise numerical values 
to the probabilities. 

Now for the axioms: 

The elements of the set T’ are undefined. In T’ we have a binary operation, multi- 
plication, which is commutative and associative and has an identity 1. Elements 
a, B in T’ are said to be disjoint if a = yé, B = ye, y, 6, € in T’, implies y = 1. 
For a, ++: , a in T’, a +--+ a ts in T’ if and only if a, --- , a are mutually 
disjoint. There is a transitive relation < in T’. Concerning this, we assume (A), 
(P), and (PC). 

This system could be considerably weakened, but the main point for the 
present is to get polarizability in while avoiding comparability. 

Given a relation of the form f < g or f < g, say for concreteness’ ae ab < 
ye, one can obtain other relations by substituting for each of a, 8, --- , € one 
of the two corresponding polarized components, e.g., a’B” < vale. The rela- 
tions so obtained will be said to be derived by polarizing the given relation. 
For the next theorem, we note the following lemmas. 

Lema 1. A product aa: --+ am can be polarized by polarizing its factors. 

PROOF. This follows by induction on m if it holds for m = 2. For m = 2, 


© In the present setting, we write (A) in the following form: Let y be disjoint from a, 
8; thena < @if and only if ay < By. Alsol < y for every y. 





INTUITIVE PROBABILITY ON FINITE SETS 419 


let ay a, as , a2 be polar components of a; , az, so that ai <a , a th. 
Since a , a2 are disjoint, so are a; , a2. Hence aia: < ala: < aiaz. 

LemMa 2. Let f, g be products of mutually disjoint elements 2, +--+ , t,. If a 
relation of the form f < g or f < g obtains, then also the relations derived by po- 
larizing the given relation obtain. 

Proor. For f < g, in view of Lemma 1, this follows from (PC). For f < g, 
in view of Lemma 1, we have to see that if f is polarized into f’f”’, g into g’g”, 
then f’ < g’. If not, then g’ < f’; and since g” < g’ < f’ < f”, alsog” < f’. 
In the case that f and g are disjoint, we get g’g” < f’g” < f’f”, hence g < f, 
a contradiction. The general case can be reduced to this case by canceling the 
x; common to f and g. 

THeoreM 5. Let 2, ,-+-- , t, be mutually disjoint elements of T’ and let T be 
the 2” products of the x; in T’. Let T = {ay , +++ , @m}, m = 2", and assume that 
1 = a < az < --+ < am. Then the order imposed upon T arises from a valua- 
tion. 

Proor. We show that the order imposed upon 7’ is compatible with a valua- 
tion. Let 8; , y; be monomials in 7 with 8; < y,; fori = 1, --- , 8, and Bj < ¥; 
for some 7; and assume that each 2; occurs as often among the 8; as among the 
7: (in other words, in terms of the definitions preceding Theorem 2, that « < « 
fore = []8; = []y;). We polarize 8; < +; (by polarizing the x;), then polarize 
the results, ete., until each 2; is split into 2° = s parts. By an appropriate choice 
Ba < va of the polarized relations, we can arrange matters so that no polarized 
component of an x; occurs in more than one 6a ; and similarly with the ya ; 
and so that the same components of the x; occur among the 8, and ya. By 
Theorem 1, Corollary applied to the set of n-2" polar components of the 2; , 
Ilsa < Ilva, and this is a contradiction since T]6a = T]va.- Hence the 
ordering of T is compatible with a valuation, and by Theorem 2, arises from a 
valuation. 

The object of the present section, and what Theorem 5 shows, is that the 
condition ‘‘e < ¢ for no e’’ is imposed on us by our intuition when we confront 
probabilities. For example, we reject the order given in the first counter-example 
above because if we judge gs < p, pg < rs, ps < tq, rt < spq, then we will also 
judge q's’ < p’, p’q” < 1's’, p’s” < td,r't’ < s’p"q’, where p’, p”, 7, 9’, 1, 
(r”), s’, 8”, t', (t”) are polar components of p, q, r, s, t respectively, and u = 
(q’'s’)(p'q” )(p’s” )(r't’) < (p’)(1’s’) (Uq')(8"p"@") = v; which is a contradic- 
tion since u = v. 


REFERENCES 

1) B. pe Finerti, ‘La ‘logica del plausible’ secondo la concezione di Polya,’’ Atti della 
XLII Riunione delia Societa Italiana per il Progresso delle Scienze, 1949 (1951), 
pp. 1-10. 

{2} W. FeEncHEL, Convex Cones, Sets and Functions, Princeton University Press, 1953. 

[3] B. O. Koopman, ‘‘The axioms and algebra of intuitive probability,’’ Ann. of Math 
Vol. 41 (1940), pp. 269-292. 

[4] L. J. Savace, Foundations of Statistics, John Wiley, New York, 1954. 





K-SAMPLE ANALOGUES OF THE KOLMOGOROV-SMIRNOV 
AND CRAMER-V. MISES TESTS 


J. Kierer! 


Cornell University 


0. Summary. The main purpose of this paper is to obtain the limiting dis- 
tribution of certain statistics described in the title. It was suggested by the 
author in [1] that these statistics might be useful for testing the homogeneity 
hypothesis H, that k random samples of real random variables have the same 
continuous probability law, or the goodness-of-fit hypothesis H, that all of them 
have some specified continuous probability law. Most tests of H,; discussed in 
the existing literature, or at least all such tests known to the author before [1] 
in the case k > 2, have only been shown to have desirable consistency or power 
properties against limited classes of alternatives (see e.g., [2], [3], [4] for lists of 
references on these tests), while those suggested here are shown to be consistent 
against all alternatives and to have good power properties. Some test statistics 
whose distributions can be computed from known results are also listed. 


1. Introduction. Let X,;; be independent random variables (1 S 7 S n;, 


1 <= js k), X;; having unknown continuous distribution function (d.f.)F; . 
We are going to consider tests of two hypotheses, the homogeneity hypothesis 


(1.1) Hi:F, = Fp = --- =F, 
and the goodness-of-fit hypothesis 


(1.2) Hoh a Fy ess) a Fy mG 


, 


where G is some specified continuous d.f. In the case of H; , the hypothesis al- 
lows the common unknown d.f. to be any continuous d.f. The class of alternatives 
to H, or H, can be considered to be all sets (F,, --- , Fx) which violate (1.1) or 
(1.2), respectively; in discussing power under alternatives, continuity of the 
F; is irrelevant. 

Let 


y( —1 r . 
Sx;(x) =n; (number of X;; S$ z,1 S$ 7 


IIA 


n;) 


be the sample d.f. of the n; observations in the jth set. We shall omit the sub- 
script n; whenever this causes no confusion. For k = 1 the Kolmogorov test 
[5] and Cramér-v. Mises w test [6] of H2, and for k = 2 the Smirnov test [7] 
and the 2-sample analogue of the w test of H; considered by Lehmann [8] and 
Rosenblatt [9], may be thought of as test criteria based on simple measurements 
of distance between S“’ and G or between S” and S”, respectively. (In this 

Received August 15, 1955; revised June 26, 1958. 

! Research under contract with the Office of Naval Research. 


420 





K-SAMPLE ANALOGUES 421 


paper, the word “distance” is not used in the technical sense; see [23], following 
(5.1).) In [1], several analogous measurements of distance (dispersion) among 
the S‘” were suggested for testing H, or Hz when k is larger than 2. For example, 
for testing H, , some of the most obvious analogues are 


U Ane supzCg,r | S® (2) S” (x)|, 
SUPg,r,2C'g,+ | S® (x) _ S(x)|, 
= sup, 0; CS (x) — S(x)I, 


= [ CAS? (x) — 8(2)P d8(2), 


oo 


Z= max; [ CAS? (x) — 8(x)P dS(2), 


where C,,, and C; are positive constants (see, however, the next paragraph) and 
S(xz) = >; njSx, (x)/>.; n; is the sample d.f. of the pooled k samples. Simi- 
larly, for testing H2, one might use corresponding statistics U’, V’, T’, W’ or 
Z’, obtained from the above by writing G for S“” or S. Each of this last collection 
of statistics has a distribution which does not depend on G in the case that H, 
is true, and each of the first collection has a distribution which does not depend 
on what the common d.f. is when H, is true. In all cases, large values of the 
statistic lead to rejection of the hypothesis. It is clear that an appropriate choice 
of the C; and C,,, in the case k = 1 of H2 or the case k = 2 of H, , reduces each 
of these tests to one of those previously mentioned for those cases in [5], [6], 
[7], {8}, [9] (in the case of [8] and [9], the integrating measure is altered slightly, 
as discussed in connection with (2.8) below). 

Many tests may be constructed along similar lines by allowing the C; and 
C,,, to be functions (of the S” for H, and of G(x) for H2) as in the treatments of 
Kae [11] and Anderson and Darling [12] when k = 1, by using other measures of 
distance or dispersion, etc. In Section 5 we shall mention a few statistics whose 
limiting distributions are easy to obtain from those of the usual Kolmogorov- 
Smirnov and w statistics, but which are intuitively less appealing than those 
we have mentioned, especially from a practical point of view. In fact, the limiting 
distribution of V’ or Z’ (suitably normalized) is that of the maximum of mul- 
tiples of k independent random variables with limiting Kolmogorov or w’ dis- 
tributions, and is thus trivial to obtain from these latter distributions. From a 
practical point of view, the problem of testing H. may thus seem to be satis- 
factorily answered by these statistics. 

Thus, our main goal is to obtain the limiting distribution under H, of ap- 
propriate statistics for testing that hypothesis, and the corresponding results we 
shall obtain for tests of H2 are less important by-products of the investigation. 
Specifically, in Section 3 we shall obtain the limiting distribution of T (and 
T’) for C; = n;, as the n; — ©, while in Section 4 we obtain the limiting dis- 
tribution of W (and W’) under the same conditions. The limiting distributions 





422 J. KIEFER 


of U, V, and Z seem more difficult to obtain, and the methods of this paper do 
not apply at all to those statistics. 

Many different proofs of the Kolmogorov-Smirnov results [5] and [7] now 
exist. Combinatorial proofs such as those of Feller [10] and of several papers by 
Russian authors (such as Smirnov, Gnedenko, Korolyuk) seem inapplicable to 
the problem of obtaining the limiting distribution of the generalizations T and 
T” of the Kolmogorov-Smirnov statistics. The geometric aspects of Doob’s 
proof [13] clearly cannot be directly generalized. However, the approach used 
by Kae in several papers since 1949, e.g., in [11], to obtain various results such 
as that of Kolmogorov, can be generalized with some slight technical modifica- 
tions to give results on the Wiener process in dimensions >1 which can be used 
with an analogue of Donsker’s result [14] to obtain the limiting distribution of 
7; such results for closely related problems have in fact been studied by Rosen- 
blatt [17]. The method of Anderson and Darling [12] could also be used, but 
perhaps guessing the solution to the appropriate diffusion equation is more 
difficult than the approach used here. 

In Section 2, therefore, we reduce the problem of finding the limiting distribu- 
tion of T or T’ to a calculation regarding a multidimensional Wiener process, 
and outline the steps to be carried out in performing this calculation. The solu- 
tion is then obtained in Section 3. A similar method will work for the limiting 
distributions of W and W’, but these may be obtained more easily by convolving 
the usual w distribution with itself an appropriate number of times (Section 4). 
In Section 5 the statistics mentioned three paragraphs above and whose distri- 
butions may be obtained from existing tables, are discussed. The power of the 
tests considered in this paper is discussed briefly in Section 6, where several other 
remarks are made. Finally, Section 7 contains tables of some of the limiting 
distributions obtained in the paper. 


2. Reduction of the problem. We hereafter write N for the vector (m , ---+ , nx) 
and consider (now exhibiting the dependence on NV) 


Tw supz > nj [Sx (2) _— Sy( a)}’, 
Tx = sup: Dy nj (S$)(x) — G(a)F’, 


Ww | Doi nj (80) (x) — Sx(x)P d8x(2), 


Wi | Dj nj (822) — G(2)F dG(z). 


(We shall also consider extensions of Wy ; see equation (2.8).) Since the dis- 
tribution of each of these statistics does not depend on G (resp., on the common 
d.f.) if H. (resp., H,) is true, we shall as usual perform our calculations under 
the assumption that G and all F; are the uniform d.f. on the unit interval. 
Let Y;, Yo, -+--, Y, be A independent separable Gaussian processes whose 
sample functions are functions of the same “time’”’ parameter ¢, 0 < ¢ S 1, and 





K-SAMPLE ANALOGUES 423 


such that EY,(t) = O and FY,(t)¥,(s) = min(s, t) — st for each 7. Thus, the 
Y; are independent ‘“‘tied-down Wiener processes’ which may be represented as 
Y,(t) = (1 — t)*w,(t/(1 — t)), where the w; are independent Wiener processes 
of the usual variety; i.e., w; is a separable Gaussian process of independent 
increments with Ew,(r) = 0 and Ew,(r)w,(¢) = min (7, c) forO S t,¢ < @. 
The use of such processes in [11], [12], [13] to obtain the Kolmogorov-Smirnov 
results is well known. Let 


A,(a) = P{ max Dias (Y(t)? s a}. 
0<t<1 


B,(a) a pif Din ly<oy dts ah 


/ 


When G is the uniform d.f., the k random functions 
vy (t) = Wnj(Se3(t) — 8), O<t<] 


are independent of each other and as n; — © their behavior approaches that 
of the processes Y, , --- , Y, with h = k. More precisely, an obvious extension 
of the argument of Donsker [14] or Theorem 2 of Kiefer and Wolfowitz [15] to 
the present case shows at once that, at all continuity points of the limit (which, 
we shall see, means for all a), 

(2.3) lim P{Tw Sa} = A;(a) 


all nj 


and 


(2.4) lm P{Wy S a} = B,(a). 
all nj>2 
Similarly, let H be a k X k orthogonal matrix such that the jth element of 
the first row of H is (n;/ >> n;)* for 1 = j S k, and write vy for the k-vector 
whose jth component is the random function ec We have already discussed 
the asymptotic behavior of vy as the n; — «. The extension of the results of 
Donsker [14] or Kiefer and Wolfowitz [15] to the present case shows, on con- 
sidering the sum of squares of the last k — 1 components of Hvy , which sum is 
equal to > n;{i sn) (t) — §,(t)]’, that 
(2.5) im P{Ty Ss a} = Ax_i(a). 
all n j-r0 
We remark that, as in the case h = 1, if F; is not continuous, the statistics 
Ty and Ty are equivalent to statistics obtained for the case of continuous F; 
by taking the supremum over a restricted range; thus, the d.f. of Ty or Tw in 
such a case is not larger than what it is for continuous F; . 
Next, we consider Wy . Since we need to prove statements which differ slightly 
from those of Rosenblatt [9], and since the partial integrations in [9] require 
some alterations, we shall carry out the required demonstration in full here 





424 J. KIEFER 


rather than to refer elsewhere.’ We shall actually prove without extra difficulty 
a more general result than that needed here, but one which is useful in reducing 
the calculation of the limiting distribution of other integral criteria in the same 
way that we reduce that of W,, . Our result is (roughly) that an integral criterion 
formed by integrating with respect to a consistent estimator of the common 
F; has the same limiting distribution if the consistent estimator is replaced by 
F,. The following statement of it is thus easily generalized: 

Lemma. Let D 2 0 be a continuous function of k — 1 real variables which is 
bounded on bounded sets and such that 


(2.6) [ a [ D( ty, +++ tea) [tr + + tealenGit +H 2dty +++ dha < x. 


Then, for each j, when all F ; are uniform on (0, 1), 
1 
(2.7) I Doh? (t) — vO (t) , +>»  oRP (Ct) — P(t) a(SP(t) — 8) 


converges to 0 in probability as all n; > ~. 

Proor: It was proved by Dvoretzky, Kiefer, and Wolfowitz [16] that 
P{sup,v{) (t) Ph x ce” for all n; and r, where c is a positive constant. Hence, 
(2.6) implies that if in (2.7) we replace the function D by max (D, L), where 
L is a constant, (2.7) is altered by a quantity which goes to 0 in probability as 
the constant L — «, uniformly in the n;. Hence, it suffices to prove (2.7) as- 
suming D is bounded and uniformly continuous, which we now assume. The 
proof of Theorem 2 of Kiefer and Wolfowitz [15] shows that for any e > 0 there 
is a value m such that the probability that 


SUPi/ms ts (i+1)/m | vy (t) — vi.) (i/m) i<qe 

for all (0 S 7 S m — 1) is at least 1 — e for all sufficiently large n;. 
Thus, given any «’ > 0, we can choose ¢« (and thus m) with regard to the 
modulus of continuity of D, so that for all n; sufficiently large the probability 
will be >1 — ¢’ that the value of the integrand of (2.7) varies over a range of 
length < ¢’ as ¢ varies from i/m to (4 + 1)/m, simultaneously for all 7. On the 
other hand, when the n; are sufficiently large, Ss; assigns measure arbitrarily 
close to 1/m to each of the intervals i/m S t S (¢ + 1)/m, with probability 
arbitrarily close to 1. Since we have seen that D may be assumed bounded, the 
assertion of the lemma now follows easily. 

We conclude at once from the lemma and the use of the orthogonal trans- 
formation H discussed in connection with 7'y that if a; , --- , a, are real numbers 
with }°a; = 1, then 


( -@ un 2 
(2.8) lm P | = Nn; (Sy) (a) — Sy(a)/ d{>~ a; SS? (x)] < ab = B,1(a); 
\ LO § : 


all nj>o 


} 





? Professor Rosenblatt has informed the author that he has constructed another correct 
proof of the result of [9], and has indicated that some corrections to [17] will appear shortly. 





K-SAMPLE ANALOGUES 


in particular, 


(2.9) lim P{Wy s a} = Byur(a). 

all nj+2 
The extension (2.8) of (2.9) includes, for example, integration with respect 
to k* >"; S%?, which is what is done in the case k = 2 by Rosenblatt [9]. It 
is easy to extend (2.8) to allow the a; to vary slightly with N, ete. 

We note that we nowhere require the ratios n;/n; to approach positive finite 
limits. This requirement, which is made in [7], [9], [10], and [13] in the case k = 2 
of H,, is inessential, and our remarks show that the results there hold without 
this restriction. 


3. The limiting distribution of 7 and Ty. In [17] Rosenblatt studies the 
distribution of a class of suitably regular functionals of the h-dimensional process 
Y = (¥i,---, ¥,) onO S ¢ S 1. We shall only state briefly the results we 
need from [17] and Kae’s paper [11]. In fact, writing 


h 
A(t) = ((¥a(t) + et)? + Do (Y,(t))*}' 


for c 2 0, if one considers only nonnegative functions v of A, which satisfy the 
regularity conditions of [17], then the analysis there may be shortened somewhat, 
and we now summarize the results we need in that briefer form; the reader may 
consult [11] or [17] for details. 


For any h-vector x and ¢ > 0, with primes denoting transposes, write 
(3.1) Qo(x, t) = (Qut)*e* */* 
and, for n > 0, with E* denoting Euclidean h-space and dé = dé,dé --- dé, , 


(32) Quaalzt) = [J Que — &t — rolle'D VQ, #) dé dr. 


It is easy to see that Q, depends on x only through z’x = r° (say), so that we can 
write Q,(x, t) = Q,(r, t). Define the generating function (in u = 0) 


oe 


(3.3) Q(r, t,u) = > (—u)"@,(r, t) 


n=0 


and, for r > 0, its transform (in s = 0) 


3.4) ¥V(r) =¥v.(r) = [ Q(r, t, we! dt. 
Jo 

Write 

(3.5) o(r) = ¢@u(r) = 1°? *Yy, u(r). 


One proves easily that y is the unique solution of the ordinary differential equa- 
tion (for r > 0) 


(3.6) vw’ (r) + a= ¥’'(r) — [28 + 2w(r)l¥(r) = 0 
, 





426 J. KIEFER 


which satisfies 
(a) ¥(r) ~Oasr— ~; 
(3.7) (b) ¥’(r) is continuous for r > 0; 
(c) asr 0, W(r) ~ — T(h/2)e?'”. 


It is sometimes convenient to rewrite (3.6) and (3.7) in other terms. For ex- 
ample, for h > 1 and suitably regular v, we can obtain ¢ as the unique solution 
(for r > 0) of 

(h — 1)(h — 8) 


(3.6a) ¢”"(r) — E oi a + 2wo(r) | ¢(r) = 0 





which satisfies 

(a) o(r) ~Oasr— ~; 

(b) ¢’(7) is continuous for r > 0; 

(3.7a) ‘ as = 
|—a r logr if h=2 


“9 


(c) as r— 0, o(r) ~ : 


T(h/2)0" 4ow 21h — 2) if h> 2. 
(Equation (3.6) is merely the reduction to an ordinary differential equation of 
the partial differential equation of [17, equation (1.14)] when v depends only on 
xx; (3.7) for the case h = 1 is the analogue of [11], equation (3.14), for the 
case h = 1.) 


Let (wi, --- , wa) be the h-dimensional Wiener process described just above 
(2.1). Let 
at 
»/ = ‘ ‘ 2 i 
(38) (0) = [ o((X bwi(s)F)') ar, 


a(q;t) = P{g(t) < qd. 


The function Q in the-case of more general v is studied by Rosenblatt [17] be- 
cause, as in the case h = 1 of Kac [11], it is desired to compute oc, and 


(3.9) [ e“*d,o(q,t) = I. Q({z'z}’, t, u) dz. 
But it can also be seen, as it was in [11], equation (6.16), when A = 1, that if 
ia ae [ v(Ac(t)) dt, 

pe(q) = Pine < qj, 


then 


2 


2 


(3.11) [ ed. p.(q) = (24)*7e""Q(c, 1, u). 
0 





K-SAMPLE ANALOGUES 427 


This is the use of Q which concerns us in obtaining distributions like those of 
(2.1) and (2.2). 

In Kac’s paper [11] it is only necessary to consider mo , since po is what we actu- 
ally want to determine. However, y,,.(0) is infinite when h > 1, so that we are 
forced to consider n, , determine y,,.(c) for c near 0, invert this to obtain Q(c, 1, 
u), and the let c — 0 to obtain Q(0, 1, u). This continuity in c of Q(c, 1, u) is 
proved by Rosenblatt [17] (it is also evident from the probabilistic meaning 
of n.); the particular case of interest to us here involves another limit operation 
and will be discussed in the next paragraph. 

In order to obtain the function A, of (2.1), we consider, as did Kac [11] for 
the case h = 1, the function 


v(r) = 


(0 if r<a, 
= 


{1 if r a, 


where a > 0. From (3.10) and (3.11) we then have 


(3.13) >{max A.(t) < a} = (2m)""e” "lim Q(e, 1, u). 
1 


Osts u>e 


It is convenient to interchange the order of inverting with respect to s and 
letting «— ~~; /i.e., by bounded convergence we have 


(3.14) ¥..0(c) = limy,,(c) = [ lim Q(c, t, u)e* dt, 
0 


uo u~o 
° 7 h/2_ c2/2 : : 
so that we can invert (27)"e° “y,,.(c) with respect to s and set t = 1 to obtain 


the left side of (3.13) and then, from the probabilistic meaning of A. , let c ~ 0 
and obtain, for a 2 0, 


(3.15) A,(a’) = lim P{max A,(t) < a}. 
c+0 Ostsl1 
For the v of (3.12), the solution of (3.6) satisfying the conditions (3.7) is 
easily obtained in terms of modified Bessel functions of the first and third kind 
({18], Vol. 2, [20]). The solution is of the form ¢(r) = r°?"y(r) = 
Cyr K a» j2(r(28)*) + Cor’ I (x2) 2(r(28)') for 0 < r < a, and of the same form 
° ° / , 
with s replaced by s + u and with C, and C, replaced by C; and C2 (say) for 
, _ « -_— . - 
r = a, where the C; and C; depend on s and u. From (3.7)(a) or (3.7a)(a) we 
obtain C. = 0, and from (3.7)(c) or (3.7a)(c) we obtain 
C, = 2(28)°*(2n)*” 
The other two constants are obtained from the continuity of ¢@ and ¢’ at r = 
In particular, we obtain, writing a(2s)* = aand a(2s + 2u)' = B, 


3 16) C2 = Kijo(a) Ka_s2(8) — (8/a) K n_2)/2(a@) Knj2(8) 
oa Ci Tnyo(a)K a—aye(B) + (8/0) I n—ay2(e) Kny2(8) 


When we let u (i.e., 8) go to ~, this ratio approaches the limit 


—Ku, 2) /2(a@) 'T in) 2(a). 





428 J. KIEFER 


Thus, we have, for 0 < r < aandh 2 1, 
(Qx)" "We.0(r) 


(3.17) 2(2s)“ = ( » tenes K «.-»)2(a(28)*) I 4—2/2(r(28)*) 
= (h—a a } Ka-ay2(r(2s) ) — — j —?. 
FOP 5h I (n—2y2(a(2s)*) ) 
(The corresponding formula and subsequent inversion in [17] is incorrect,” 
due to a mistake in evaluating C;). 
To invert (3.17), we consider the Fourier-Bessel expansion of [18], Vol. 2, 
p. 104, equation (58): 





: J (az) ities . sg = S400 Dd olen X) 
(3.18) J,(z)¥,(Xz) — Y,(z)J,(Xz)] = ————, . 
) zd ,(z) ( x (2° = Yin rary nF 


where y,,,(n = 1, 2, ---) are the positive zeros of J,, v and z are arbitrary, and 
0 <2s X Ss 1. (A similar formula of Watson ({20]. p. 499) seems incorrect, 
as can be seen in the case v = 3, z — 0 there.) Divide both sides of (3.8) by 
J,(az) and let x — 0, noting that J,(7,,.7)/J,(az) — (y.n/z)"; it is easy to 
justify taking the limit inside the sum. Put z = ia(2s)' and X = r/a. We then 
obtain, from (3.17), (3.18), and the relation of J and K to J and Y, where v = 
(h — 2)/2, 


\ 2 2 vn : , vn ) 
(3.19) (2r)"y,0(r) = 4>> (7 - ) jcc ‘ 
ar f [Josa(Yen)] (2a's + 0,0) 





n=l 


It is easy to see that this series can be inverted term-by-term with respect to s; 
inverting and setting t = 1, we have from (3.14), 


x ’ , e 

r2/2 vn Jt >ryVvn a)e-*? n 2a? 

(3.20) {max A,(t) < a} = 2e > ( : ) SLE Ee 
Osts! al ar [J 41(¥,n)]2a* 








Finally, letting r — 0, we have, from (3.15) and (3.20), 
THeoreom. For h = 1 (see also (3.27) and (3.31)), 





: 4 =. (vaonn) expl—(va-on.) /20 
ot aiddP ys dagploaass > (ya 2)/24n pi- (¥ = 2.n) / a| 
( 3.21 ) r h ohi2_h n=l [J ny2(¥ an—2)/2 n)|? 
5)? 4 
Thus, writing (2) = A;(2°) for x > 0 and &(x) = 0 otherwise, &_; and 


#, are the limiting d.f.’s of Ty and +/T%, , respectively. 

The series converges rapidly (see also the discussion of the two succeeding 
paragraphs for large a), but reduces to an expression in terms of elementary 
functions only when h = 1 orh = 3. Whenh = 1, we have y-4,, = (2n — 1)x/2 
and thus [.J;(y_;..)|> = 4/(2n — 1)2°. Thus, for a > 0, 


2. (27) —(2n—1) 242 /8a2 
(3.22) Ai(a) = >. e ; ; 





K-SAMPLE ANALOGUES 429 


which is Smirnov’s result, since 7'y is the square of the usual Smirnov statistic 
when k = 2. Similarly, for h = 3 we obtain, for a > 0, 


« « 2! 2 2 —n2x2/2a2 
(3.23) A;(a’) = ne i 
@ r= 
In these cases we can obtain alternative expressions which are more useful for 
computations when a is large. These may be obtained directly by using an ap- 
propriate transformation on a theta function, or by noting that (3.17) reduces to 


mw sinh [(a — r)(2s)']/s' cosh [a(2s)'] 
when h 1 and to 
(2x)! sinh [(a — r)(2s)'|/r sinh {a(2s)*} 


when h = 3, and these are tabled as theta function transforms in [19], Vol. 1 
p. 258, equations (34) and (31), the first of which is wrong in sign. For h = 
we obtain, letting r — 0, the more familiar form of A; for a > 0, 


(3.24) A\(a’) = 1 +2) (-1)"@?"™. 

n=l 
(For h = 1, but not for h = 3, we could have let r — O before inverting, and 
used [19], Vol. 1, p. 257, equation (24).) For h = 3, the inverse Laplace trans- 
form is given in terms of a derivative of the theta function 6 ; letting r — 0 
yields 


2n2a2 


(3.25) A;(a’) =]+ 4) (4 x, 2n’a’ |e 
n=l 


The existence of the two forms for A; and A; suggests that a form more useful 
than (3.21) for large a might be found. There seems to be no simple analogue 
of the theta function transformation for the series of (3.21), but in this and the 
next two paragraphs we mention other computational approaches which may 
prove useful. There are other Fourier-Bessel expansions which can be employed 
in inverting (3.17). For example, one series for J,(2z)/J,(z) ({18], Vol. 2, p 
104, equation (59)) gives (writing v for (h — 2)/2) 


(Qa) *Y, 20(r) 


(3.26) i oe (2s)') — J AYrn 7/8)» K(a(28)*) 
Et a Rlr(28)) ay, Jil. n)(2a?s + 7 y? ‘ad ; ; 


Now, by iY Vol. 1, p. 283, equation (40), 2(2s)""K, (r(2s)! )/r is the trans- 
form of fe” ™" , which becomes 1 at ¢ = 1, r— 0. Since (2a’s + y’)™ is the 
transform of e” 1/20 /2a° and (a/r)’J,(yr/a) > y'/2T(» + 1) as r > O, we 
obtain 


2 1 = (¥n)"™ [ —v—) —a2/2t —(y,.,)2(1—f) /2a? 
3.27 )=1- —_e ao t — ; It. 
(3.27) Aa(a’) Fal +1) 2, Fata & e e ( 





430 J. KIEFER 


For computational purposes, this formula has the disadvantage of involving a 
numerical quadrature, but it has the advantage that the series converges rapidly 
for a large. 

Another way of trying to obtain a more useful formula for large a is to try to 
use the theta function transformation on a function close to that of (3.21). 
The following is such an approach when h is odd. We again write v = (h — 2)/2. 
Now, for large n we have y,,, ~ #(4n + 2v — 1)/4 (see, e.g., [18], Vol. 2, pp. 
60 and 85) and [J,4:(y»..)]' ~ 2/27... Thus, an approximation to the summand 
of (3.21) is 

2 on ree a — 1/7" ~w2 [n-+(2»—1) /4] 2/202 
(3.28 ) fly, n,a) = x E a 4 | i 
How good an approximation this is of course depends on the exponential term; 
but the form of (3.28) is suggestive of theta functions. In fact, the transforma- 
tion 6(f"'v| — ¢") = (—it)'e*™ ‘a;(v / t) ({18], Vol. 2, p. 370), on putting 
t = —1/irx, becomes 


x x 
o <¢ b r2{(n+v]2z —2r2ytz imvn n2/x 
(3.29) (rx) pia = € de é ; 


i=—x n=O 


so that, for 2v a nonnegative integer, 


2 A ¢ -2r2y2r ow . 1 0 : 
> eo [n+v] 22 i ; e™ mje. 3 Pa (n+v] 2r 
(3.30) oa 2(rx)} n=—e2 2 2a 
= q(x) — qo(x) (say). 
Putting v = (2v — 1)/4, differentiating (2v + 1)/2 times with respect to z, 
and denoting the summand of (3.17) by g(», n, a), we thus obtain for odd 
A 2a. 
h (2h 
Tr ({ 2 a x 
ae 3 A,(a°) = Zz [g(v, n,a) — flv, n, a)] 
(3.31) 4 =. 7 








+ cu GE 


(2»+1)/2 
5 <) [qi(xz) — qe(a)) later . 
When f is close to g, this will be a convenient formula, since q; converges rapidly 
asa— © and @ will contain only 2v + 1 terms. 

Another approach to obtain different expressions from (3.17) to invert, and 
which allows us to let r — 0 before inverting with respect to s, is to note that 
although the Laplace transform y of Q(r, t, u) is infinite for r = 0, the transform 
of t”"Q(r, t, u) is finite there for m an integer > h/2. But this is just d”y,,..(r)/ds”. 
Thus, performing such a differentiation and letting u — «© and r — 0, we obtain 
an expression whose inverse transform with respect to s at ¢ = 1 give 
(2x) *A,(a’). 

Tables of the functions A, will be found in Section 7. Even when h is even, 


<a eR 





K-SAMPLE ANALOGUES 431 


the computation is not very difficult. For example, when h = 2 the denominator 
of the summand of (3.21) is approximately 2/zyo,, , as we have seen, and the 
series is easy to work with. For the next odd h above those we have considered in 
detail, h = 5, the 7,,, are solutions of tan x = x and the summand of (3.21) is 
TY n(Yen + 1) (Vr? /20* 19, 


4. The limiting distribution of Wy and Wy . The differential equation of (3.6) 
and (3.7) can be solved, when v(r) = r’, in terms of a confluent hypergeometric 
function (specifically, by (3.7)(a), in terms of the Whittaker function W,,,); 
but a more direct approach is to note, on reversing the order of integration and 
summation in (2.2), that the distribution B, is merely the h-fold convolution of 
B, with itself. In the case h = 1, it is well known that (2m)'Q(0, l, uv) = 
{(2u)*/sinh(2u)*}’. Raising this to the Ath power, we obtain (2r)""Q(0, 1, u) 
for general h. We can now follow a procedure like that of Anderson and Darling 
({12], p. 201): we obtain, on integrating by parts, 


1 
(4.1) | e ““B,(a) da = u{(2u)*/sinh(2u)*}’ . 
0 


Using the binomial expansion on [1 — ort *, (4.1) becomes 
(4.2) Qi 4 = Mj + h/2) —l+h 4 (ou) Hj o 


£5 jIP(h/2) 


This series can be inverted term-by-term in terms of tabled transforms, without 
computations like those of [12]: from [19], Vol. 1, p. 246, equation (9), we 
find that w'*"*e"®""9**" is the Laplace transform of 


gtie vig HOD, SAG + h/4)t), 


where D is the parabolic cylinder function. Thus, inverting (4.2) with respect 
to u, we obtain, for a > 0, 


pari /2 2 rj + h/2) 


(43) Bila) = an = jIT(h/2)_ 


e FOND) 9 2((27 + h/2)/a’). 
Thus, B, and By: are the limiting d.f.’s of W, and W:,, respectively. 
B,, can be written in a more convenient form if h is even. In that case if we 
. 4 ‘ é / n 22 »n —z? n 

write H, for the nth Hermite polynomial, i.e., H,(2) = (—1)"e d"e dx’, 

we obtain from the relation between D, and H, ({18], Vol. 2, p. 117), fora > 0 

and h even, 


glh+l)/2 2 


(44) B,(a) = ~ r(j + h/2) 


Tare Oy FRI) On Haniel (25 + h/2)/(2a)!), 
7=0 . / 


When A is odd, (4.3) can be written in terms of the Bessel functions K; and 
K,, as follows: Since ({18], Vol. 2, p. 119) D4(z) = (z/2m)*Ky(27/4) and 
Diz) = —e*dfe"*"D_y(z)\/dz = w%(z/2)"[Ky(22/4) + Ky(2*/4)], suc- 
cessive use of the recursion relation D,,,(z) = zD,(z) — vD,4(z) and the fact 
that K, = K_, yields D,,_;, for m a positive integer, in terms of K, and K,. 





432 J. KIEFER 


In the case h = 1, substitution of the formula for D_, in terms of Ky gives the 
formula of [12], equation (4.35). 
Tables of B, will be found in Section 7. 


5. Criteria whose distributions may be obtained from previously known 
results. We limit our discussion to criteria for testing H,; ; analogues for testing 
H, are obvious, and some criteria have been mentioned in Section 1. We shall 
also limit our discussion to criteria of the Kolmogorov-Smirnov type, ones of 
the integral (w’-) type being obtained similarly. Symbols newly defined in this 
section need not have their earlier meaning. 

One of the simplest tests whose size may be computed from previously known 
results is that based on the maximum of the k — 1 random variables 


Y,=C sup, | S,;(x) — a niSi(x)/>_ nz |, (2sjsk)’ 
<j <i 


which are obviously independent under H, (since, for example, the conditional 
distribution of sup. | S:(2) — Se(x)| given the value of the function nS, + n2S2 
does not depend on the latter). Y; is distributed like a multiple of the Smirnov 
2-sample criterion for sample sizes n; and >> <j; ; thus, the tables of Massey 
[21] may be used in an obvious way to compute the d.f. of max;Y; . Of course, 
asymptotically one may use the Kolmogorov-Smirnov distribution A;(a’). 

This test may be made more symmetrical by choosing at random the indexing 
j of the k sets. Another method of symmetrizing is to subdivide each of the k 
original sets of observations into k! subsets, form k! collections each of which 
contains one subset of each original set, index the subsets in each collection in 
a different one of the k! possible ways, compute the maximum of the Y; for each 
collection, and take the maximum of these over all collections. 

A test based upon the Y; of the previous paragraph is a special case of the 
class of tests based on the k — 1 quantities Z; = sup, | R;(x) | (j = 2,---, k) 
where the R; are any k — 1 orthogonal linear combinations of the S; which are 
orthogonal to S; however, the Z; will in general be independently distributed 
only in the limit, not for finite n; as with the Y; 

For k = 3, the asymptotic behavior of max (Y2, Y3) was also noted by Fisz 
[22]. For k > 3 Fisz suggests dividing the k samples into approximately k/3 
collections of 3 or 2 samples each, computing the above or the Smirnov statistic 
from each collection, and then computing the maximum of these. The resulting 
test is clearly inferior to those we have considered: it is not even consistent, 
since it tests effectively only differences within the various collections. 

Another simple test whose size may be computed from previously known 
results is the following: Let the n; observations in the jth sample be divided at 
random into k — 1 subsets, each subset containing approximately the same num- 
ber of observations, and call the sample d.f.’s of the observations in the k — 1 
subsets of the jth samples Si,(x \isrsk,r #j); for any fj, je with ah #7 je, 
the distribution of Z;,;, = C hie SUDz |Sj,3.(@) — Sio3,(x)| (where Craie i is a suit- 
able normalizing constant) may again be obtained from Massey’s tables (21), 





K-SAMPLE ANALOGUES 433 


and the size of a test of H, based on such a statistic as max;, ;,Zj,,j, is again 
easily computed, since the Z;,;, are independent. 

Tests based on statistics like }>Y, are less convenient to use, since the com- 
putation of size entails the convolution of the Kolmogorov-Smirnov d.f. ®,(2) = 
A;(x") with itself. For example, a single convolution of ©, with itself using term- 
by-term integration of (3.24) yields the d.f. Gz given for z > 0 by a slowly con- 
verging double sum of terms involving the normal d.f., and this is extremely 
poor for computational purposes. It is in fact. easier to obtain G, by numerical 
integration of the convolution formula, and this has been done to obtain a 
table of G. in Section 7. 


6. Power; miscellaneous remarks. We again limit the discussion to tests of 
H, , similar remarks applying for H, . We use the notation of Section 1. 

It is easily seen that, for the test of size (approximately) a > 0 based on 
T, U, V, or any of the procedures listed in the previous section (excluding that 
of Fisz [22] for k > 3), for any 8 < 1 there is a value 6(a, 8) such that any of 
these tests has power > against all alternatives for which 


sup,,r2{|F (2) — F,(x)| min(n} , n})} > (a, 8). 


However, tests based on criteria such as Z or W cannot be guaranteed to have 
the property just cited; this may be demonstrated exactly as it was for w -type 
tests in another problem in the paper by Kac, Kiefer, and Wolfowitz [23]. Similar 
results may be proved relative to other measures of distance of alternatives 
from H, , as in [23]. Thus, distance tests of the Kolmogorov-Smirnov type seem 
preferable in applications to those of the w”-type. 

We note that the distribution of A, obtained in Section 3 gives an asymptotic 
computation of power for certain alternatives when T' is used. 

We remark that the methods of this paper may be modified along the lines of 
the papers by Darling [24] and Kac, Kiefer, and Wolfowitz [23] in parametric 
cases, e.g., to test the hypothesis H, under the assumption that the F; are all 
normal, or to test that the F; are equal and normal. 

In the case k = 3 of H,, when all n; are equal, David [25] has used a clever 
device to compute the distribution of max;,.[S;(z) — Sj4:(2)|, where the sub- 
scripts are taken mod 3. The method does not seem to generalize. 

The use of “distance” criteria in various nonparametric multi-decision prob- 
lems, e.g., problems of ranking or of classification, is to be recommended, but 
the appropriate distribution theory is more complicated. 

The author plans to return in another paper to consideration of some of the 


limiting distributions discussed here using a method somewhat similar to that 
of Doob [13]. 


7. Tables. The functions A, of Section 3 and B, of Section 4 (1 S A S 5), and 
the function G, defined in Section 5, have been tabled by the Cornell Computing 
Center’s 650. I am indebted to Miss Susan Litt, Miss Virginia Walbran, Mrs. 
Jane Wiegand, Professor R. J. Walker, and Mr. R. C. Lesser, for carrying out 
this work. 


(Continued at the foot of p. 438) 

















j 
434 J. KIEFER 
. TABLE 1 
Tables of (x) = A,(z*) fori = 1, 2, 3, 4,6 
x | (x) | @:(x) } @:(x) (x) s(x) 

0.37 000826 | 
0.38 001285 | 
0.39 | 001929 | 
0.40 | .002808 
0.41 | .003972 | 
0.42 | .005476 | 
0.43 | .007377 
0.44 | .009730 
0.45 | .012589 
0.46 | .016005 | 
0.47 | 020022. | 
0.48 | 024682 | 
0.49 030017 . 
0.50 | 036055 . 
0.51 | 042814 
0.52 | 050306 | 
0.53 | 058534 000894 
0.54 |  .067497 001256 
0.55 077183 001731 
0.56 |  .087577 .002342 
0.57 098656 003115 
0.58 | 110394 .004079 | 
0.59 | . 122760 .005262 
0.60 | 135717 006696 
0.61 | 149229 008412 
0.62 | 163255 010441 
0.63 177752 | 012816 
0.64 | . 192677 .015566 
0.65 | .207987 018720 000762 
0.66 . 223637 022307 | 001035 
0.67 . 239582 026350 001383 
0.68 . 255780 .030874 001824 
0.69 | . 272188 035897 002373 
0.70 | . 288765 041437 | 003050 
o7i | . 305470 .047507 | 003874 
0.72 322265 054116 004866 
0.73 339114 061271 006050 
0.74 355981 068976 | 007447 

, 0.75 | . 372833 077230 009081 
0.76 . 389640 .086029 010977 .000820 | 
0.77 | 406372 095367 013159 001080 | | 
0.78 423002 105233 015649 001406 4 
0.79 | .439505 |  .115614 .018472 .001810 f 
0.80 | .455858 | 126496 | 021649 .002306 | | 
0.81 472039 .137859 | .025201 .002907 | } 
0.82 .488028 | 149685 .029149 .003631 ! 
0.83 .503809 | 161950 033510 004493 | 
0.84 .519365 | .174632 .038300 .005511 if 


- - — f 





K-SAMPLE ANALOGUES 


TABLE 1—Continued 


| 
(zx) @2(x) } #3(x) (x) 


534681 .187705 | 043534 006704 | 
549745 .201142— | 049223 008092 .000897 
564545 | 214917 055378 .009694 | 001157 
579071 220001 | .062006 .011530 .001476 
593315 243366 | 069112 .013621 .001867 
607269 257982 | 076699 015986 .002340 
620928 | 272822 | 084766 .018645 .002908 
634285 | 287855 093313 .021618 003584 
647337 | .303054 . 102333 024924 004382 
660081 318390 | 111821 | 028579 | 005317 
672514 | .333834 .121767 .032600 .006407 
684636 349361 . 132160 037004 | .007666 
696445 364942 . 142988 041802 | 009113 
707941 380554 . 154236 047009 | 010765 
.719126 | .396169 165887 | .052634 | .012639 
730000 411765 | .177923 .058687 014754 
740566 | .427319 . 190326 065174 | 017127 
750825 | .442800 203074 072101 =| 019777 
760781 458214 | 216146 079471 .022720 
770436 473514 . 229521 087284 | 025972 
779794 | . 488690 . 243174 095541 | .029551 
788860 | 503725 | . 257083 104239 033471 
797637 | 518603 | .271223 .113372 037747 
806130 | . 533308 285569 1229035 .042390 
814343 | 547826 | .300099 132919 | 047414 
822282 | 562143 | 314786 . 143314 052828 
829951 576248 . 329607 .154110 058642 
837356 590130 344538 165291 .064862 
844502 603779 | 859554 .176846 071495 
851395 | 617184 | 374635 188756 | 078545 
. 858040 .630340 . 389749 . 201006 .086015 
864443 643237 404883 .213577 093904 
870610 655871 420012 . 226450 102213 
876546 | 668235 435114 .239605 | .110938 
882258 680325 .450170 . 253023 120075 
887750 | 692137 465159 . 266681 .129619 
893030 | 703668 =| 480064 . 280558 139562 
898102 714916 494865 . 294632 149895 
902973 | 725879 509546 308881 . 160607 
907648 | 736555 | . 524090 323283 .171687 
912134 | 16946 . 538483 387815 | .183121 
916435 757050 . 552710 352455 | 194895 
920557 | 766869 . 566758 .367181 | . 206993 
924506 | .776403 . 580613 .381971 | .219400 
928288 785655 594266 . 396804 . 232097 
931908 794626 607703 | .411658 =| 245067 
935371 803319 .620917 426513 258290 
938682 | 811737 633898 441348 .271746 
941847 | .819883 646638 456145 | 285417 
944871 827761 659129 470884 .299281 



























































436 J. KIEFER 


: TABLE 1—Continued 


| 


















x (x) 2(x) | s(x) s(x) s(x) 
1.35 | .947758 . 835374 .671366 | .485547 .313318 
1.36 | 950514 842727. | . 683343 500117 . 327506 
1.37 |  .953143 .849824 | 695055 514577 .341825 
1.38 | .955651 | .856670 | .706498 .528911 | 356254 
1.39 | 958041 .863269 | .717669 543104 .370771 
1.40 | 960318 | 869627 | . 728564 .557141 | . 385356 
1.41 | 962487 875748 | 739183 571009 =| ~——. 399989 
1.42 | .964551 881638 | =. 749523 584696 | =. 414648 
1.43 | 966515 | 887302 | .750585 598190 | .429314 
1.44 968383 | 892745 . 769367 611479 | ~—-. 443968 
1.45 | .970158 .897973 778871 | .624554 | .458590 
1.46 971846 902992 | 788096 637405 | .473163 
1.47 | .973448 907808 | .797046 650025 | .487667 
1.48 | .974969 | 912425 | 805720 662404 | 502087 
1.49 | .976413 916849 814122 .674537 516406 
1.50 | 977782 921086 | 922955 686418 530607 
1.51 | — .979080 925142 | (830121 | 698041 | 544676 
1.52 | .980310 | .920023 {| .837724 +$| .700401 =| 558598 
1.53 | .981475 | .932733 | 845067 720496 | . 572360 
1.54 982579 | = 936278 |. 852154 .731321 | 585948 
1.55 | .983623 | .939664 | 858990 .741874 599352 
1.56 | .984610 | 942897 | 865579 752155 | .612560 
1.57 | 985544 | 945980 871926 762160 | .625561 
1.58 | .986427 | .948921 | 878036 771890 =| 638346 
1.59 | .987261 | .951723 | 983913 781345 650906 
1.60 988048 | .954393 | .889563 790525 | 663233 
1.61 |  .988791 956934 894991 | 799432 .675320 
1.62 989492 959352 900203 808066 | .687161 
1.63 | .990154 | .961651 | 905203 .816430 .698749 
1.64 | .990777 | .963837 | 909998 824526 .710081 
1.65 | .991364 | 965913 914593 | .832356 .721151 
1.66 .991917 | .967885 | .918994 | 839925 .731957 
1.67 9924388 | .969756 | .923206 | .8472385 | 742495 
1.68 | 992928 | 971530 927235 . 854290 .752763 
1.69 | .993389 | .973213 | .931087 | 861094 .762760 
1.70 | 993823 .974807 934766 | .867651 . 772485 
1.71 | 994230 .976317 .938280 | .873967 | 781936 
1.72 | .994612 | .977746 941633 | .880045 | .791116 
1.73 | 994972 979099 |  .944830 885891 |  .800024 
1.74 | 995309 980378 .947878 891509 . 808660 
1.75 | .995625 | .981586 .950781 . 896905 .817028 
1.76 995922 982728 | .953546 | .902084 | .825130 
1.77 .996200 | .983807 | 956176 907052 | .832066 
1.78 .996460 984824 . 958676 911813 | .840542 
1.79 . 996704 985784 961053 .916375 | 847859 
1.80 .996932 | . 986689 .963311 .920741 854921 
1.81 997146 | .987542 | .965455 .924919 861732 
1.82 | .997346 | .988345 | .967488 -928913 |  .868296 
1.83 997533 | 989102 969417 .932729 | 874618 
1.84 997707 | 989813 | .971245 -936373 | — .880703 











2(x) 





Nw Ww & 


tb 


2. 
2. 
2. 
2.1: 
2. 
2. 
2. 
2.18 
2. 
2. 
2. 
2.2% 


NS hw b& 


~~) 


NOnNNWNW NWN Dt 


BD BD ODD OND OO oe ee eet et eet 
es ‘ 


Nunn 


- 990483 
-991112 
-991703 
- 992259 
- 992780 
. 993269 
993728 
-994158 
. 994560 
.994938 
995291 
-995621 
- 995930 
. 996219 
. 996489 
996741 
. 996976 
.997195 
. 997400 
.997591 
. 997768 
. 997934 
- 998088 
. 998231 
. 998364 
.998488 
.998603 
998710 
- 998809 
998901 
998987 
- 999066 
-999139 


TABLE 1—Continued 


- 972976 
-974615 
-976166 
-977633 
-979019 
. 980329 
- 981566 
-982733 
. 983833 
- 984871 
- 985848 
- 986769 
- 987635 
. 988450 
. 989216 
. 989936 
. 990612 
.991247 
.991843 
. 992402 
. 992925 
. 993416 
.993875 
.994305 
.994707 
. 995083 
.995434 
- 995762 
- 996069 
- 996355 
.996621 
- 996870 
.997101 
.997317 
-997518 
.997704 
997878 
. 998039 
. 998189 
- 998328 
. 998458 
-998577 
- 998688 
.998791 
- 998887 
-998975 
- 999057 
-999132 





(x) 


.939851 
.943167 
946328 
. 949338 
. 952204 
.954931 
.957524 
. 959987 
. 962326 
.964547 
. 966653 
. 968649 
.970541 
.972332 
. 974027 
.975631 
.977146 
.978578 
. 979930 
. 981206 
.982409 
. 983543 
- 984612 
.985618 
. 986565 
. 987455 
. 988292 
. 989079 
.989817 
.990511 
991161 
.991770 
. 992342 
.992877 
.993377 
. 993846 
994284 
.994693 
.995075 
.995432 
. 995765 
.996076 
.996366 
. 996635 
. 996887 
.997120 
-997338 
.997540 
.997728 
.997902 


s(x) 


886554 
.892177 
. 897578 
. 902760 
-907731 
-912494 
-917056 
.921423 
925599 
. 929591 
. 933404 
. 937044 
-940517 
- 943827 
- 946981 
. 949984 
952842 
. 955560 
.958142 
. 960595 
. 962924 
-965133 
- 967227 
-969211 
.971090 
. 972868 
974549 
.976139 
.977640 
. 979058 
. 980396 
.981657 
. 982846 
983966 
. 985020 
. 986012 
-986945 
. 987821 
. 988645 
- 989418 
. 990143 
. 990823 
.991460 
.992057 
-992616 
-993139 
993628 
. 994085 
.994512 
.994910 
.995282 





438 J. KIEFER 


TABLE 1—Continued 








r | (x) @2(x) | @3(x) (x) s(x) 
2.36 | 998215 995629 
2.37 | 998354 995952 
2.38 | . 998483 . 996253 
2.39 | 998603 | .996534 
2.40 .998714 996795 
2.41 | .998817 .997038 
2.42 | 998911 | . .997263 
2.43 | 998999 | .997473 
2.44 | . 999080 . 997668 
2.45 999155 .997849 
2.46 | .998016 
2.47 | .998172 
2.48 | .998316 
2.49 | 998449 
2.50 | .998573 
2.51 . 998687 
2.52 | . 998793 
2.53 .998891 
2.54 .998981 
2.55 | .999065 
2.56 | .999142 

TABLE 2 
Table of the inverses 2 ;'(p) 
| } 

p @;' | @ 3") #;' (9) @2 (p) op) 
2 | 0.67645 | 0.89456 1.05493 1.18776 1.30375 
50 0.82757 | 1.05751 1.22349 1.35992 1.47855 
75 | 1.01918 | 1.25209 | 1.42047 1.55788 1.67728 
80 1.07275 1.30614 | 1.47337 1.61065 1.72997 
85 1.13795 1.37025 | 1.53692 1.67388 1.79299 
90 | 1.223885 | 1.45399 | 1.61960 1.75593 1.87462 
95 |} 1.35810 | 1.58379 | 1.74726 1.88226 2.00005 
.98 1.51743 | 1.73699 | 1.89743 2.03053 2.14698 
.99 1.62762 | 1.84273 | 2.00092 2.13257 2.24798 
.995 1.73082 | 1.94172 | 2.09773 2.22797 2.34235 
999 | 1.94948 | 2.15162 2.30296 2.43009 2.54217 
9909 | 2.22530 | 2 2.79481 


2.41695 | 2.56244 2.68565 

,(2) = A,(2°) is tabled in Table 1 for 1 S A S 5 and for z in steps of .01 
from #;'(.001) to ;,'(.999). Tables of &,'(p) for various often used values of p 
are given in Table 2. Thus, in using the statistic 7 (resp., T’) to test H (resp., 
H.) when the n; are large, with a test of size a, one should reject the hypothesis 
when fT > $.4(1 — a) (resp., ~/7” > #'(1 — a)). 


(Continued on p. 444) 





-000006 
.002892 
.023832 
.066851 
. 123719 
. 186020 
. 248436 
308145 
. 363856 
.415127 
.461959 
.504575 
. 548293 
.578461 

610424 
639507 
. 666005 
.690186 
. 712291 
. 732530 
. 751092 
. 768144 
. 783833 
. 798290 
.811630 
. 823958 
. 835364 
. 845930 

855730 
. 864829 

873285 
.881153 

888478 
. 895305 
.901673 
.907617 
-913168 
.918358 
.923211 
. 927753 
-932006 
. 935990 
.939724 
. 943226 

946512 
.949595 
.952490 
.955210 
.957765 


Bs(x) 


.000324 
.001566 
.004768 
.010891 
020564 
-034001 
.051075 
-071420 
094544 
. 119910 
. 146986 
. 175283 
. 204366 
. 233862 
. 263459 
. 292900 
. 321978 
350530 
. 378432 
.405587 
-431928 
457406 
.481991 
505668 
528431 
. 550283 
571236 
591305 
.610511 
. 628877 
.646428 
. 663191 
.679193 
.694464 
. 709031 
. 722922 
. 736166 
. 748790 
. 760820 
. 772283 
. 783203 
. 793605 
803513 
-812950 
821936 


TABLE 3 
Tables of B; (x) fori = 


| 


B3(x) 


.000914 
.001966 
.003735 
.006438 
.010272 
.015396 
.021924 
.029920 
.039405 
.050357 
.062721 
.076413 
.091332 
. 107364 
. 124383 
. 14226 

. 160881 
. 180110 
. 199832 
. 219937 
. 240320 
. 260885 
. 281544 
. 302218 
. 322835 
343331 
363651 
. 383745 
.403570 
. 423088 
.442268 
. 461084 
479514 
.497538 
.515144 
. 532320 
- 549056 
565349 
.581193 


.000708 
-001249 
.002067 
.003240 
-004848 
.006971 
-009682 
.013049 
.017130 
.021971 
.027605 
034056 
.041333 
049437 
.058356 
-068071 
.078555 
.089771 
. 101682 
. 114243 
. 127406 
. 141122 
. 155340 
. 170007 
. 185074 
. 200488 
. 216199 
. 232160 
- 248323 
. 264643 
. 281078 
. 297587 
.314133 





.000675 
.001043 
.001566 
-002274 
-003184 
.004359 
-005830 
.007632 
.009813 
.012394 
.O15414 
.018906 
.022887 
.027378 
.032397 
.037951 
.044054 
-050702 
.057898 
065629 
-073892 
.082674 
.091955 
. 101720 
-111948 
. 122617 





440 


0.69 





Bi (x) 


- 960167 
. 962425 
964549 
.966547 
. 968427 
.970197 
.971864 
.973433 
.974912 
.976305 


977618 


.978855 
. 980022 
.981122 
-982159 
. 983138 
. 984061 
. 984932 
. 985754 
.986530 
. 987262 
. 987954 
. 988607 
. 989224 
- 989806 
. 990356 
. 990876 
.991367 
.991831 
. 992270 
. 992684 
993076 
.993447 
.993797 
-994128 
.994441 
.994737 
.995017 
. 995282 
. 995532 
- 995769 
. 995993 
. 996205 
.996776 
. 996946 
.997107 





. 830494 
838642 
.846400 
853787 
. 860819 
867515 
.873889 
879957 
.885734 
.891233 
. 896468 
.901451 
.906195 
.910710 
-915008 
.919100 
. 922995 
. 926702 
.930231 
.933590 
.936787 
.939830 
.942727 
-945485 
.948110 
. 950608 
. 952986 
. 955250 
.957405 
.959455 
.961408 
. 963266 
. 965035 
. 966718 
. 968321 
. 969846 
.971298 
. 972680 
.973995 
.975248 
.976439 
.977574 
.978654 
.979681 
. 980660 
.981591 
. 982477 
. 983321 


J. 





| 








KIEFER 


TABLE 3—Continued 





Bs(x) 
. 596590 
.611537 
.626039 
.640097 
.653717 
. 666904 
.679663 
. 692004 
. 703933 
. 715458 
. 726589 
. 737333 
.747701 
. 757702 
. 767344 
. 776639 
. 785596 
. 794224 
. 802533 
.810532 
. 818232 
. 825641 
.832769 
. 839624 
.846217 
. 852555 
. 858647 
. 864502 
.870127 
. 875532 
. 880723 
. 885707 
. 890494 
. 895090 
. 899501 
.903735 
.907797 
.911696 
.915436 
.919024 
. 922465 
.925765 
. 928930 
.931964 
. 934874 
.937663 
. 940336 
. 942898 


L 





. 330680 
.347194 
. 363646 
. 380006 
. 396248 
-412349 
-428287 
.444042 
459597 
474935 
-490043 
. 504908 
.519519 
. 533868 
547945 
.561745 
.575262 
. 588492 
.601431 
.614076 
.626427 
. 638482 
.650242 
.661707 
.672878 
.683757 
.694347 
. 704649 
. 714668 
. 724407 
. 733869 
- 743059 
. 751980 
- 760639 
. 769038 
777183 
. 785079 
. 792732 
800145 
807326 
.814278 
.821007 
.827519 
.833819 
. 839912 
. 845803 
.851499 
. 857003 


Ba(x) 





. 133701 

.145177 

157017 

. 169195 
. 181679 
. 194449 
. 207471 

. 220721 

. 234170 
. 247790 
. 261557 
. 275444 
. 289426 
. 303480 
.317582 
.331712 
345847 
359967 
374053 
. 388088 
.402054 
.415937 
.429721 

. 443394 
.456943 
.470349 
.483607 
.496713 
509646 
. 522402 
.534981 
.547361 

559556 
.571546 
.583319 
. 594903 
. 606259 
.617411 

. 628332 
639045 
649538 
.659801 

. 669848 
.679675 
. 689284 
.698668 
. 707832 
. 716780 


' 





K-SAMPLE ANALOGUES 


__ TABLE 3—Continued 


By(x) 


984124 945353 862321 

.984889 947706 | 867459 

. 985616 949960 | .872421 

. 986309 .952120 877213 

. 986968 954190 . 881839 

.987596 956172 .886304 
.988193 .958070 | 890614 | 773472 
.988761 959889 894771 . 780754 
. 989302 961630 898782 . 787834 
989817 .963298 902651 794727 
. 990308 964895 | 906382 801427 
.990775 .966425 | 909979 807943 
.991219 .967888 .913447 .814272 
991642 969291 .916790 820424 
.992044 970632 | .920011 826397 
992427 .971916 923115 832199 
.992792 .973146 926106 | 837833 
993139 974322 928986 | . 843298 
.998957 993469 975448 931761 848602 
999011 .993784 976525 934433 853750 
.999063 994083 977557 | 937006 858742 
. 994368 978544 939484 | .863580 
994639 979488 | 941868 . 868274 
.994897 .980391 944164 872821 
995143 981256 | 946373 877227 
.995377 .982082 | .948499 | .881497 
.995599 | .982873 950544 . 885630 
995811 .983630 952512 | 889635 
.996013 984354 954405 | 893515 
996205 | 985047 | 956226 | 897268 
.996388 985708 | 957977 | 900902 
.996562 | .986341 | .959661 | .904419 
996727 .986947 961281 .907818 
996885 .987526 | . 962837 911110 
997035 988080 964334 914292 
.997178 988610 | 965773 .917370 
997313 | 989116 | .967156 | .920346 
.997443 .989600 .968485 . 923223 
997566 990063 .969762 | .926004 
.997683 .990506 | 970989 | 928692 
.997795 990929 972169 931287 
.997901 991334 | 973302 | .933797 
998002 991721 974390 . 936220 
.998098 | 992091 975435 .938560 
998190 | 992444 976439 | 940821 
998277. | .992782 977404 | .943003 
.998360 .993104 .978330 .945110 
998439 .993413 979219 947145 
998514 | .993708 .980073 949108 
99858. | .993990 980893 951002 




















442 J. KIEFER 


TABLE 3—Continued 





x | By (zx) B(x) Bs(x Balx B;(x) 
1.48 . 998654 .994259 .981680 | .952831 
1.49 . 998718 994517 982436 | . 954595 
1.50 | 998780 |  .994763 983161 |  .956298 
1.51 | .998839 .994998 983857 | 957937 
1.52 | 998895 995223 984526 | .959519 
1.53 . 998948 995437 985167 | 961044 
1.54 .998999 995643 985782 962520 
1.55 .999047 .995839 986373 | .963941 
1.56 .999093 . 996026 . 986939 .965311 
1.57 996205 987483 | 966629 
1.58 .996376 988005 | .967897 
1.59 996539 988505 | .969129 
1.60 996695 988985 | .970307 
1.61 996844 | 989445 | .971452 
1.62 .966987 989887 | .972538 
1.63 997123 | 990311 =| .973602 
1.64 .997253 990717 .974615 
1.65 997377 991106 | .975598 
1.66 997495 | 991480 | .976544 
1.67 .997608 991838 | 977450 
1.68 | .997717 992182 | .978329 
1.69 .997820 992511 | .979165 
1.70 | 997919 .992827 | 979979 
1.71 .998013 993129 | .980765 
1.72 .998103 993420 | 981511 
1.38 .998189 993698 | .982239 
1.74 .998271 993964 . 982932 
1.75 | .998349 994220 | .983606 
1.76 | 998424 994465 984252 
Lats .998496 .994700 si 984865 
1.78 .998564 994925 | 985462 
1.79 | .998629 995140 | .986040 
1.80 998692 995347 | .986590 
1.81 | 998751 995545 | .987123 
1.82 .998808 995734 | 987635 
1.83 .998862 995916 | 988124 f 
1.84 | .998914 .996090 | .988597 
1.85 998963 | 996257 989056 
1.86 | 999011 996417 989493 | 
1.87 | 999056 996570 | =. 989915 1 
1.88 996717 | 990315 
1.89 996857 | .990709 
1.90 | 996992 | .991077 | 
1.91 | 997121 | 991439 
1.92 | 997244 | 991781 
1.93 997363 | 992111 | 
1.94 997476 | .992431 } 
1.95 | 997584 | 992742 
1.96 .997688 |  .993039 } 
1 


97 .997788 993321 





| 


22822388 


WNNNNNNHNNNNNN 


2. 
2. 
2. 
2. 
2. 
2. 
2. 
2.: 
2. 
2.4 
2.2 
2. 
2. 
2. 
2. 
2. 
2. 
2.¢ 
2.¢ 
2.5 
2.3 
2.é 
2.¢ 
2. 
2. 
2. 
2. 
2. 
2. 
2. 


K-SAMPLE ANALOGUES 


TABLE 3—Continued 








B:(x) Ba(x) 





- 997883 
.997974 
. 998061 
.998145 
- 998225 
. 998302 
998375 
.998445 
998513 
.998577 
. 998639 
. 998698 
- 998754 
. 998808 
- 998860 
. 998909 
998957 
. 999002 
. 999046 





Bs(x) 


.993593 
.993853 
. 994107 
.994346 
.994577 
.994802 
.995014 
.995219 
.995417 
.995605 
.995787 
. 995963 
. 996132 
. 996290 
.996445 
. 996596 
.996737 
.996873 
. 997004 
.997131 
.997252 
.997367 
.997479 
.997584 
. 997687 
.997787 
.997882 
.997971 
. 998059 
.998143 
. 998224 
. 998298 
.998373 
. 998446 
.998512 
. 99857 

. 998637 
. 998699 
.998756 


998812 


. 998866 
.998916 
. 998962 
.999012 
.999055 





444 J. KIEFER 


The corresponding tables of B, , the limiting d.f. of W (with h = k — 1) 
and of W’ (with h = k), and of B;', are Tables 3 and 4. 

Tables 1 and 2 were computed from equation (3.21), while Tables 3 and 4 were 
computed using the form of (4.3) given in (4.4) and the paragraph following 
(4.4). A program developed at Cornell was used to obtain the Bessel functions 
by power series or asymptotic series in appropriate regions. 

As a check, the tables for h = 1 were compared with that of ®, of Smirnov 
26] and that of B, of Anderson and Darling [12]. In the case of 4, , the last 
tabled figure often differed slightly; wherever a discrepency was noted in the 
last two places, the tables were checked by differencing, and Smirnov’s appeared 
to be in error. The table of [12] checked with that of B, here. 

As mentioned in Section 5, the easiest way to compute tables of the convolu- 
tion G2 of , with itself appeared to be by numerical integration, and Table 5 
was computed in this way. Thus, for example, to test H; with size a when k = 3, 
one can use the statistic Y. + Y; of Section 5 with Cz = [nynz/(m + ne)]' and 
Cs = [ns(m + n2)/(m + me + 1; yy, rejecting the hypothesis for large n; when 
¥Y. + ¥3 > G2'(1 — a). 

Added in proof: The author has recently learned that the following inde- 
pendentiy obtained results, which overlap some of those of this paper, appeared 
somewhat after [1] and the submission of earlier versions of the present paper: 
the limiting d.f. of T'y has been considered by J. J. Gichman in Teorya Veryot- 
nostei i yeyau primenyenya, vol. 2 (1957), pp. 380-384, using an approach like 
that of [12], and two papers by L. C. Chang and M. Fisz in Science Record, vol. 
1 (1957), pp. 335-346, consider tests like those discussed in the second and fourth 
paragraphs of Section 5. 


TABLE 4 
Table of the inverses Bz" (p) 


p B;' (p B;' p) Bs" (p) B;* - B;* (9) 
.25 0.07026 0.18545 0.31472 0.45103 0.59161 
50 | 0.11888 0.27757 0.44138 0.60668 0.77253 
75 | 0.20939 0.42098 0.62227 0.81775 1.00947 
.80 0.24124 0.46640 0.67691 0.87980 1.07785 
85 | 0.28406 0.52481 0.74592 0.95734 1.16268 
90 | 0.34730 0.60704 0.84116 1.06311 1.27748 
95 | 0.46136 0.74752 1.00018 1.23730 1.46466 
98 | 0.61981 0.93320 1.20561 1.45913 1.70028 
99 | 0.74346 1.07366 1.35861 1.62263 1.87215 
995 | 0.86939 1.21412 1.51010 1.78345 2.03935 
999 1.16786 1.54027 1.85773 2.14949 2.40774 
9999 | 1.60443 2.00691 2.3495 2.66130 2.825 





1. 
- 
Le 
1. 
- 
1. 
Be 
1, 
1. 
}. 
a 
‘. 
h, 
1. 
,. 
os 
.. 
.. 
e 
& 
1. 
1. 
1. 
1. 
1.3 
BS 
1 
1.3 
Re 
1.3 
1.é 
- 
1.: 
1.é 
1.é 
1.3 
1.é 
l.é 
1.é 
1.é 
1.4 
Ry 


-0028 
-0034 
-0040 
.0048 
.0056 
.0065 


0076 


.0087 
.0100 
-O115 
.0131 
.0149 
.0168 
.0189 
.0212 
.0238 
.0265 
.0294 


0326 


.0359 
.0395 
.0434 
.0475 
-0528 
0564 
.0612 
.0663 
.0717 
.0773 
.0832 
.0893 
.0957 
. 1023 
. 1092 
. 1164 
. 1237 


1314 


. 1392 
. 1474 


eee 


. 1557 

. 1642 
. 1730 
. 1820 


TABLE 5 


Table of G:(z) 


G2(x) 


. 2005 
.2100 
.2197 
. 2295 
. 2396 
. 2497 
. 2601 
. 2705 
. 2811 
.2917 
. 3025 
.3133 
3242 
.3352 
. 3463 
.38573 
. 8685 
. 3796 
. 3909 
.4020 
.4133 
.4244 
4356 
.4467 
.4579 
4689 
.4801 
.4910 
. 5020 
.5127 
. 5236 
.5342 
.5449 
.5554 
.5658 
5761 
.5864 
. 5964 
. 6064 
.6162 
.6260 
.6355 
.6451 
.6543 
. 6636 
.6726 
.6816 
.6902 
.6989 
. 7073 


mt ett tht 


s 


NS Ww & to 


bh 


NNNNN NNN NN DD tO 


nh 


NNN N NN N DW W tH 


ot bo 


tb 


i) 


No Nw hw Ww NW NW NW W b bo 


nN bw 


to 


bo 


~ 


~ 


NNNNNN NWN NW WD 


NNN NNNNNNNNNNNHNNNNNN WI 


— Ss 


wh 


NJ J 5] 5] 9) 


sys -J 
OND 


2s 


NS b 


- 9586 
. 9605 
- 9638 
. 9653 
. 9669 
. 9682 
.9697 
.9709 
.9723 
.9734 
9747 
-9757 
.9769 
.9779 
.9790 
-9798 
. 9808 
-9816 
.9826 
9833 
.9841 
.9848 
9856 
. 9862 
. 9869 
. 9874 
-9881 
- 9886 
. 9892 
. 9896 
.9902 
- 9906 
-9912 
.9915 
. 9920 
.9930 
. 9934 
. 9937 
.9941 
-9943 
.9946 
.9948 
-9952 
. 9953 





446 J. KIEFER 


REFERENCES 
[1] J. Kierer, ‘Distance tests with good power for the nonparametric k-sample problem”’ 
(Abstract), Ann. Math. Stat., Vol. 26 (1955), p. 775. 
(2) W. H. Krusxat, ‘‘A nonparametric test for the several sample problem,” Ann. Math. 
Stat., Vol. 23 (1952), pp. 525-540. 
[3] I. R. Savage, “Bibliography on nonparametric statistics and related topics,’’ J. Amer. 
Stat. Assn., Vol. 48 (1953), pp. 844-906. 
[4] P. DeMunrTer, ‘“‘Consistance des tests non-paramétriques pour la compardison d’échan- 
tillons’’ Acad. Roy. Belgique Bull. Cl. Sci., (1954), pp. 1106-1119. 
[5] A. N. Kotmoacorov, ‘Sulla determinazione empirica delle leggi di probabilita,’’ Giorn. 
Ist. Ital. Attuari, Vol. 4 (1933), pp. 1-11. 
[6] R. von Mises, Wahrscheinlichkeitsrechnung, Deuticke, Vienna, 1931. 
[7] N. Saagnov, ‘‘On the estimation of the discrepancy between empirical curves of dis- 
tribution for two independent samples,’’ Bul. Math. de l'Universite de Moscou, 
Vol 2 (1939), fase. 2. 
[8] E. L. Lenwann, “Consistency and unbiasedness of certain non-parametric tests,”’ 
Ann. Math. Stat., Vol. 22 (1956), pp. 165-179. 
[9] M. Rosensuarr, “Limit theorems associated with variants of the von Mises statistic,”’ 
Ann. Math. Stat., Vol. 23 (1952), pp. 617-623. 
(10) W. Feuer, ‘On the Kolmogorov-Smirnov limit theorems for empirical distributions.” 
Ann. Math. Stat., Vol. 19 (1948), pp. 177-189. 
{11] M. Kac, ‘“‘On some connections between probability theory and differential and integral 
equations,’’ Proceedings of the Second Berkely Symposium of Mathematical Statis- 
tics and Probability, University of California Press, 1951, pp. 180-215. 
{12} T. W. ANDERSON AND D. A. Darina, ‘“‘Asymptotic theory of certain ‘goodness of fit’ 
criteria based on stochastic processes,’’ Ann. Math. Stat., Vol. 23 (1952), pp. 193- 
212. 
[13] J. L. Doon, ‘‘Heuristic approach to the Kolmogorov-Smirnov theorems,’’ Ann. Math. 
Stat., Vol. 20 (1949), pp. 393-403. 
[14] M. L. Donsxer, ‘Justification and extension of Doob’s heuristic approach to the 
Kolmogorov-Smirnov theorems,’’ Ann. Math. Stat., Vol. 23 (1952), pp. 277-281. 
[15] J. Kisrer ano J. WoLrow17z, ‘‘On the deviations of the empiric distribution function 
of vector chance variables,’’ Trans. Amer. Math. Soc., Vol. 87 (1958), pp. 173-186 
[16] A. Dvorerzxy, J. Krerer, anv J. WoLrowirTz, ‘‘Asymptotic minimax character of the 
sample distribution function and of the classical multinomial estimator,’’? Ann. 
Math. Stat., Vol. 27 (1956), pp. 642-669. 
[17] M. Rosensuatr, ‘‘On a certain class of Markov processes,’’ Trans. Amer. Math. Soc., 
Vol. 71 (1951), pp. 120-135. 
[18] A. Erp&xyi et al, Higher Transcendental Functions (3 vols.), McGraw-Hill, New York, 
1953-1955. 
[19] A. Erp&xy1 et al, T'ables of Integral Transforms (2 vols.), McGraw-Hill, New York, 1954 
[20] G. N. Watson, A Treatise on the Theory of Bessel Functions, 2nd edn., Cambridge 
University Press, Cambridge, 1944. 
[21] F. J. Massgy, Jr., ‘‘Distribution table for the deviation between two sample cumula 
tives,’’ Ann. Math. Stat., Vol. 23 (1952), pp. 435-441; also Vol. 22 (1951), pp. 125- 
128. 
[22] M. Fisz, ‘‘A limit theorem for empirical distribution functions,’’ Bull. de |’ Acad. Pol. 
des Sci., Vol. 5 (1957), pp. 695-698. 
[23] M. Kac, J. Krerer, anp J. WoLrowi7, ‘‘On tests of normality and other tests of good 





K-SAMPLE ANALOGUES 447 


ness of fit based on distance methods,’’ Ann. Math. Stat., Vol. 26 (1955), pp. 189- 
211. 

[24] D. A. Daruina, ‘““The Cramér-Smirnov test in the parametric case,” Ann. Math. Stat., 
Vol. 26 (1955), pp. 1-20. 

[25] H. Davin, “‘A three-sample Kolmogoroff-Smirnov test,’’ Ann. Math. Stat., Vol. 29 
(1958), pp. 842-851. 

(26] N. Smirnov, “Table for estimating the goodness of fit of empirical distributions,” 
Ann. Math. Stat., Vol. 19 (1948), pp. 279-281. 





ASYMPTOTIC EXPANSIONS FOR THE SMIRNOV TEST AND 
FOR THE RANGE OF CUMULATIVE SUMS! 


By J. H. B. KemperMan 
Purdue University 


Summary. Let z, denote the position at time n of a particle describing a one- 
dimensional random walk, such that the increments [, = z, — 2n-, (n = 1, 
2, ---) are independent random variables, assuming only the values +1 and —1, 
each with probability 4. Of considerable importance in many applications is 
the conditional probability 


Pa(t,j,c) = Pla, = j3,0 <2n <eo,m=1,---,n| m= 1); 


here, i, j, c, nm denote positive integers. In section 1, an asymptotic development 
for pa(i, j, ¢) is given; for each positive integer m, it yields an approximation 
to pn(i, j, ¢) with error smaller than Cn” where C is independent of i, j, c and 
n. As a simple application, an asymptotic development for the binomial coeffi- 


cient (" isderived by letting 7, j, c tend to infinity in such a manner that j — i = 


2s — n. 

As a second application, an asymptotic expansion is derived for the joint dis- 
tribution of the extrema of the difference between the empirical distributions of 
two samples of size n. 

The above asymptotic development for p,(i, j, c) is obtained by applying 
the central Lemma 4 to an exact formula for p,(7, j, c). In Section 5, using this 
formula, an exact formula is obtained for the distribution of the range R, of 
the n + 1 numbers 2, --- , 2... Applying Lemma 4 to it, a complete asymptotic 
expansion for the distribution of R, is derived. 


1. Main result. Consider a random walk z), 2, --- of independent incre- 
ments ¢, = Zn, — 2n-1, such that 


P(S, = +1) = P(E. = —1) = }, (n = 1,2,---). 
In the sequel, n, i, j, c always denote integers withn 2 0,0 <i<c,O Sj Se. 
Let p,(i, j, ¢) denote the conditional probability, given z = 7, that z, = j 
and 0 < z, < c for m = 0, 1, --- , n. Observe that p,(i, j, c) = O unless the 
integers 7 — i and n are of the same parity. 
It is well-known that 


, . . ia —n = nm 
Pn(t, j, c) = 72 2. IG +j —it+ 2kc) ») 


nM 
by + 7+ 1+ 2ke) 2). 
Received April 15, 1958; revised October 13, 1958. 


1 This research was supported in part by the National Science Foundation, Grant NSF 
G-1979. 


(1) 





448 





ASYMPTOTIC EXPANSIONS 


if n+%+ j is even. Moreover, 


c—1 


(2) Pn(i, j, €) = (2/c) >> sin kwi/e sin kxj/e (cos kr/c)”. 
k=l 


A simple proof of (1) and (2) is as follows. Let c and i be fixed; then the func- 
tion p,(t, Jj, ©) is uniquely determined by the obvious relations 


Pasi(t, j,¢) = [pa(t,j — 1,¢) + palit, j + 1, €))/2, (0 <j <c), 


Pr(i, 0, c) = pali,c,c) = 0 and pi, j, c) = 1, iff i = j, = 0 if i x j. But 
it is easily verified that the function defined by the right hand side of (2), (or 
(1), respectively), satisfies all these relations. 

If s(k) denotes the kth term in the right hand side of (2) we have s(c — k) = 
(—1)**’*"s(k); moreover, s(c/2) = 0 if cis even. Hence, (2) may be written as 

[(e—1) /2) 

(3) pali,j,c) = (4/c) >> sin kwi/c sin kxj/c (cos kx/c)" 
if i+ j+ n is even, (pa(t, Jj, c) = 0, otherwise). Using (3), we shall derive 
an asymptotic development for p,(i, j, c) with a remainder O(n-™+) holding 
uniformly with respect to all the parameters i, j and c, (m = 1, 2, ---). 

More precisely, let 


7 (2" ay 1) 
A3( * 
(4) : (2v) (2r)! 
where B, > 0 denotes the vth Bernoulli number, (B,; = 1/6, B, = 1/30, B; = 
1/42, By = 1/30, Bs = 5/66, --- ; Ao = 1/2, Ay = 1/12, Ap = 1/45, A; 
17/2520, Ay = 61/28350, A; = 691/935550). Further, let 
(5) Ags = 20’ Al +++ Agt(nil +++ opt); 
where the summation is extended over all the sets (1, --- , ») of non-negative 
integers which satisfy 

Atwmt::-+y=h; y+ Qt ees + oy, = 


Thus, Ao = 1 and A,, = Oif h > wu. Further, for uw = 1, Aw = 
Ay, = (12)™/u!; also Age = A,;Ao = 1/540. 
Finally, let 


B,, (vy = 1,2,--- 


(6) H(z) = ($) ce? = A(z) &*”, 


(r = 0,1, --: ). For instance, Ay = 1, Ae = 2 — 1, Ay = x‘ — 62" + 3, 


A, = a° — 15a* + 452” — 15. In general, 


A(x) = (2r)! >> (—2)72" /(v(2r — 2y)!). 


We can now state the main result concerning p,(%, j, ¢). 





450 J. H. B. KEMPERMAN 


THEOREM 1. Let 


(7) gr = 4(a/x)' >> sin kwi/e sin kxj/c e * (2ak’)’, 
k=l 
where, for brevity, 
a = rn/(2c). 


A formula equivalent to (7) is 


(8) ge = (—1)) D> (Ahn 4G — i + 2ke)) — H2(n*(j + ¢ + 2ke))I, 
k=—awo 
ifr = 0, 1, ---. Finally, let 
m—1 u 
(9) a Pa(t, J; ¢) — (2/x) yp os 2 (—1)*Ayrgusn ’ 
u=0 h=0 
(m = 1, 2,--- ). Then, for each integer m = 1 and each constant K > 0, there 


exists a number M > 0, depending on m and K, but independent of i, j, c, n, such 
that 
(10) [Um | S M(e*™ +n” “(1 + a°"*")), 


for each choice of the integers i,j, c, n withi +j + neven,0 <i<c,0 S7 
n > 0. 


lA 


2. Auxiliary results. Proof of Theorem 1. 

Lemma 1. Let —log cos wi = w/2 + w'/12 + --- denote the analytic function 
for | w| < x'/4, which assumes real and positive values for w real, 0 < w < 4/4, 
and let 


(1) ¢g(w) = (— log cos w' — w/2)w”. 
Then o(w) 2 0 for w real and positive, w < 2/4. Moreover, we have the Taylor 
expansion 
2 u 
‘ ug(w) h -h 
(2) ene) = Amu, 
p=0 h=O 


holding for | w\| < x /4 and arbitrary u, where the A,, are as defined above. 

Proor. Let A; > 0 be defined by (1.4), especially, Ao = 4}. Integrating the 
well-known expansion tan z = ae (2» + 2)A.x”™, ( |a2| < 2/2), we obtain 
—logcosz = 2°/2 +07, A.w”**. Hence, from (1), ¢(w) = w De a, 
(| w|< 2/4). The above assertions now easily follow. 

Observe that, from (1), formula (1.3) may be written as 

9 ((e=1)/2) 
pat, j,¢) = = 


k=l 


gy Fhe ed ta * uo ie 
. (cos ee(j — $) — cos (HG +4) ey (- : (ak*Y*, = at), 
c c n n 


(3) 





ASYMPTOTIC EXPANSIONS 


where a = 2'n/(2c’) and 


o 4 
(4) ¥(u, w) = 8 = DF DY Aww, (|w| < #/4). 

p=0 h=0 
The proofs not being any more difficult, and in view of the proof in Section 5, 
we shall determine the asymptotic behavior (for small values of | | and | 7 | ) 
of more general sums of the type 

d.  cos kx &™'f(o(Bk*)”, +(Bk*)*). 
1<k<d 

Here, f(u, w) denotes a fixed analytic function for |u| << w,0< |w|< wo, 
(tw > 0, wo > 0), admitting the expansion 


~ “ 


(5) f(u, w) = w* pe = B,xu'w*, 


p=0 h=0 
(|u| < uw,0<|w| < wo), where s denotes an integer and the B,, are com- 
plex constants. 
LemMa 2. Let m denote a fixed non-negative integer, and let 


m “ 
R(m) = f(u, w) — w* > DY By. 
p=0 h=0 
Then to each pair of positive constants uw, and w, with uy, < Uo , Wi < Wo there corre- 
sponds a constant M, independent of u, w, such that 


wR(m)| Ss M(\u|"+)|w]”), 


whenever|\u| Sm,\w| Sw. 


Proor. Let |u| S m,|w|S w, and put 6 = Max(|u|/m,| w/w), 
6 < 1. We have 


o “ ° 


“ 
| w'R(m)| = | >> >> Buu *w**| < 0" >> >| By | utwt” = Ke”. 
h=0 


pom h=0 —m 


Lemma 3. To each real number r ~ —}4 there corresponds a constant M, such 
that, for each choice of the positive numbers B and i, 


Le (ak) = Mate (1 + (8n’)™), ifr > —4, 


(6) ee 


< Mp *e~™*(gy?)'"*, ifr< —}. 


Proor. Let S(8, \) denote the left hand side of (6). In the proof we may 
assume that \ 2 1. For, suppose the lemma has been proved for this case. Then, 
for0 <A < landr < —}, 


S(6,) = S(B, 1) s Mp te *p"* < Me te ™*(an’)'™. 


On the other hand, let 0 < A < l andr > —}. Then S(8, 4) = S(6,1) Ss 
Msp e*(1 + 9’). If 2° < X < 1 we have 


e?(1 + et) < 6 F(1 + (26n7)"*) < 27-1 + (an?)). 








452 J. H. B. KEMPERMAN 


Further, for 0 < 4 < 2%, 81 + a’) s ee" + Bt) < Ke™, if 
K denotes the maximum value of the function ¢*?(1 + 8"), 6 > 0. 

Thus, let \ = 1. The function f(z) = e~**"(8x")’, (x > 0), is decreasing if 
r < O and, for r > 0, has a unique maximum at 2% = (r/8)', where f(x) = C, 
C denoting a constant independent of 8. Further, if r > 0, f(z) is increasing for 
0 < x S 2% and decreasing for x = x. Hence, letting 


(7) I - | e *™ (gx*)’ dr, 
» 
we have 
(8 S(6,\4) I+ C, ifr > OandaA < x, 
) 


<sI+ e ™* (Bn*)’, ifr SO0orA = XH. 


Ifr > OandaA < 2 = (r/B)! we have, from \ = 1, that 8 < Bd’ < r, hence, 

— ~ - r —4 —Bar? 2\r 
C s Ce ™*B/r) ' In any case, from A 2 1, e 8x" Bn?) = B 1m (Bd*) 4 
Finally, letting Bx” = y in (7), I = 6 *J(Bd")/2, where 


(9) J(w) - | ey" dy. 


It follows from (8) that it suffices to prove the existence of an absolute con- 
stant M, such that, for w > 0, 
J(w) S$ Me“(1 + w™), ifr > —}, 


(10) he 

s Me “w'™, ifr < —3% 
Letting y = w(1 + z) in (9), we have J(w) = €°w' fee "(1 + z)"* dz. 
This proves (10) when either r < —} or r > —}4, w 2 c, c denoting a fixed 


positive constant. Finally, if r > —4, w <S c, we have 
J(w) = Tir+ 4) S Tir+ })e. 
Lemma 4. Co~sider the sum 


(11) S = DO coskx e™f(o(Bk*)”, 7(8k’)*), 
1<ksh 

where o and +r denote complex numbers, x a real number, , 8 positive real numbers, 
Pp, q non-negative integers; (S = 0 if X < 1). Further, let By, , 8, uo, Wo be as in 
(5), and 


m—l 4p 00 
¥ h_p-h— —Bk2 2\ ph+q(u—h— 
(12) S. =>, >> Buc'r’ ">, coskre ™ (pk )™ to *. 
p=0 h=0 k=1 
Assertion: to each choice of the integer m > 0 and the positive numbers u, < uo , 
w; < wo there corresponds a constant M > 0, independent of X, x, B, o, +, such 
that 


|S — S.| s Me*|+|~“{e-™*(6n?)-#4 
+e Pig 1"(1 + 8?” wots) e? iel*ai or"), 





ASYMPTOTIC EXPANSIONS 453 


for each choice of the parameters , x, B, « and 7, satisfying \ > 0,8 > 0, BX = 1 
and 


(13) |o \(8X")? Sum, |7|(6X)* Sw. 
Finally, the same assertion holds true if in (11) and (12) the summation variable 
k is restricted to the odd integers. 

Proor. In the following, M denotes a positive constant, independent of \ 


x, B, o, 7, not necessarily the same constant on each occasion. From Lemma 2 
using (13),forl Sk SX, 


f(o(Bk’)”, 7(Bk")*) pm > > Byo'r p—h— *(Bk?)hteu-a—o a ena 


with | anx M\r\|~“(\o| "(k’)?”™™ +)|7|™(Bk)*"™). Hence, from 

oe 1, (whether or not & is restricted to the odd positive integers), 
|\S—S.|< T: + T:, where T; = M\7\~* ‘Dee (|¢ | "(Bk?)?""* + 
|r| (ak? yer), and T: = Dime Dihno | Buo'r” "| Dinan e Mak?) te, 
From pr* = 1 and Lemma 3, 


is 
s 


T> > Mgt > | o'r y—h—s | \(Bd Street +8 


p=0 h=0 
s Mpte™" | | ~*(), 
from (13). Further, from Lemma 3, applied with A = 1, 


Tl, s Mp te* lr] “(lof "1 + arnt) +|7r|/"1+ gre). 


yielding the stated assertion. 
Lemma 5. For B > 0,r = 0,1, --- , we have 


8h + 23° cos kx e*™*(2¢K2)" = (—1)"(x/B)! x Hz, A) 
. S k=l a Pat k k=—o@ F +/ 28 : 
where H2, is defined by (1.6). 
° 5 —y?2/2 . . . 

Proor. In view of Ho(y) = €”” and 4) = 1, the special case r = 0 is equiv- 
alent to a well-known identity for theta-functions. Differentiating 2r times 
with respect to x, the general result immediately follows. 

Proor OF THEOREM 1. Let 7, j, n, c denote integers,0 <i<c,0 Sj Se, 


n> 0,i + j7 + neven. Then p,(i, j, c) is given by (3) and (4), where 

(14) a = wn/(2c). 

Further, let m denote a given positive integer, K a given positive constant. It 
suffices to prove (1.10), (with M depending only on m and K), under the addi- 
tional restriction that 

(15) n=2K*21 


(for, letting afterwards K = 1, the general result immediately follows). Let 





454 J. H. B. KEMPERMAN 


\ > O be defined by 


(16) ad? = Knit 
hence, 
(17) (wd/c)* = 2n “ad? = 2n'K < 2 < #°/4, 


thus, 4 < c/2. From Lemma | and (4), 
9 Y 
Osy (- : (ak*)?, . at) ms: 0<k<ce/2, 
7 


thus, the contribution to the right hand side of (3) of the terms with k > X is 

at most equal to (4/c)(c/2)e = 2°" . Consequently, from (3) and (14), 
pa(t, j,¢) = O(e*™™) 

+ (2/r)(2a/n)*(S(4(j — i)/e) — S(w(j + i)/e)), 


where S(x) = Dish) cos kr e“"y( —(4/n)(ak")*, (2/n)ak*). In order to es- 
timate the latter sum, we apply Lemma 4 with \ as above, 8 = a, o = —4/n, 
r = 2/n, p = 2,q = 1, f(u, w) = W(u, w), s = 0, wm = 4K’, (w arbitrary, 
Uo > %), W, = 2 < #*/4 = wy. Then (13) holds, from (17) and (4/n)(ad*)” = 
4K* = uw. Moreover, ad’ = Kn' => K 21. Hence, using Lemma 1, Lemma 4 
yields that, (for real values x), 


(18) 


(19) | S(x) — S,(x)| S Ma?*(nte*™ arr +a). 


where M denotes a constant depending only on K and m. Here, 


m—l uw oo 
(20) S,.(z) = _ 7. Ayn(—4/n)*(2/n)""* >> cos kre ** (ak*)*™. 
= k=l 


Theorem 1 is an immediate consequence of (18), (19), (20) and Lemma 5, 
the latter implying the equivalence of (1.7) and (1.8). 


3. Asymptotic expansion of the binomial coefficient. Let p,(?, 7) denote the 
conditional probability, given 2. = i, that z, = j, 2 > 0 fory = 0,1,---,n 
thus, 


, 


(1) Pn(t,j) = lim p,(i, j, c). 
From (1.8), lim... (—1)'g, yy ie a G —i))- H3(n*(j + 7)), hence, f 
from Theorem 1 and (1), ifz > 0,7 20,n +72+ 7 even, 
Pn(t,j) = (2/e >> (—1)*n”” - AwlHena2n(n*(j — i) 
(2) sia 
— Hinan(n Gj + t))| a Um 
where 


(3) |um| S Mn-”"* 


’ j 





ASYMPTOTIC EXPANSIONS 


M denoting a constant independent of i, 7, n. Further, 


(4) pa(i,j) = 2 I@ ¥ j- na » b, + i+ sia) 


(e.g., from (1) and (1.1)), if > 0,7 2 0,n + i + j even. Keeping n fixed 
and letting 7, 7 tend to infinity, such that n + 7 — i = 2s, s an integer, we have 
from (2), (3) and (4), 


m—1 » 
(5) 27 (") = (2/e)'S (= 1) FY Ay Hoggan (28 — 0) + O(n), 
u=0 = () 


the remainder holding uniformly in s and n. An alternative proof of (5) might 
be obtained by starting with Stirling’s formula or from an application of a gen- 
eral theorem of C. G. Esseen (Acta Mathematica, Vol. 77 (1945), p. 63). 


4. The Smirnov test with equal sample sizes. Let x, ,--- , 2n, Yi, °°" 5 Yn 
denote 2n independent observations on a real random variable having a con- 


tinuous distribution. Further, let 


F,(s) = 2. 1/n, F.(s) = +» 1/n 


zis vise 


denote the empiric distributions of the samples 2, --- , %, and y,-°-:, Ya; 
respectively. Finally, let 


(1) P,(a, b) = Prob (—a/n < F2(s) — F,(s) < b/n for all s), 


where a, b denote positive integers or +2. It is not difficult to show, 
ef. Gnedenko and Korolyuk [4], that, irrespective of the underlying distribu- 
tion, 


o 2 
(2) oo ( ) P,(a,b) = pon(a,a,a + b), 


where p,(i, j, ¢) is precisely the quantity studied in the previous sections. 
Hence, from (1.1), 


2n\ > _< an \ 2n 
(” ) P.(a, b) = 2 (, oS J (, +a+t+ a 


where c = a + b, a result due to Gnedenko and Rvaéeva [5]. Moreover, from 
(2) and (1.3), 


[(e—1) /2] 


4 2 9 2 
(3) s~ (°") P,(a,b) = (4/c) >) (sinkwa/c)’ cos kx/c)*, 


k=] 
where c = a + b, especially, 
[a/2 


9 } 
(4) =~ (°") P,(a,a) = (2/a) > (cos (k —})x/a)*. 
k=l 


n 


Massey [6] gave a table of P,(a, a) forn < 40, a < 13, (in his notation, a = 





456 J. H. B. KEMPERMAN 


k + 1). In computing this table from (4), the resulting series would contain 
at most six terms, most of which are neglegibly small for (say) n 2 10. 


Applying Theorem 1 to (2), one obtains an asymptotic development for 
P,,(a, b). In fact, let 


(5) gr = 4(a/x)' >> sin kea/c sin keb/c & “"(2ak*)’, 
k=l 


where c = a + b, a = n/c’; an equivalent formula is 
(6) (—1)'g- = >> {H2-(2ke(2n)*) — H%,((2a + 2ke)(2n)*)}, 


kom 


(both formulae being especially simple if a = b). Then, for each fixed integer 
m = 1, there exists a constant M, independent of a, b, n, such that 


| m—l - 
i (3) P,(a,b) — (2/x)* >> (2n)** ¥ (-1)"Ays Dots | 
(7) | n pao h=o 


<s Mn™™* 


holds true for each choice of the positive integers a, b, n. Moreover, 
from Stirling’s formula, for n large, 


on { 2n oy } , hee 2) wf An? 
(8) 2 “i ~ (#n)’(1 + 1/(8n) + 1/(128n") — 5/(1024n") + ---). 


Combining (7) and (8), one obtains an asymptotic expansion of P,(a, 6) in 
powers of 1/n with a remainder O(n”) holding uniformly with respect to the 
integers a and b, (m = 1, 2, --- ). For instance, the special case m = 4 yields 


P,(a,b) = go + (3go — g2)/(24n) + ($90 — 3g2 — 'gs + 439s) /(24n F 
+ (—+2290 — $9: — +293 — ba + 4895 — kgs) /(24n)* + Of n*). 
The weaker result 
P,(a, b) = go + (390 — ge)/(24n) + of n*) 


is due to Gnedenko [3]. 
Finally, from (2), (3.1), (3.2), we have the expansion 


o~ ing P,(a, © ) 


n 
~ (2/e)* SS (—1)*(2n) 7"? YS As Hd (0) — HE (2an)}. 
u=0 h=0 


Using (8), one obtains results of the type 


(10) P,(a,o) =1—e® "1 + a(1 — a’/(3n))/(2n*)} + O(n”), 


the remainder holding uniformly in a. Here, (10) contains a result due 
to Gnedenko [3]. 





ASYMPTOTIC EXPANSIONS 457 


Remark. The reader should note that (9) holds only for positive integer 
values of a and b. For instance, from (9) and (5), 


(11) P,(a, a) = Go(x'n/(4a")) + O(n"), 


where Go(x) = 4(a/x)'(e™ +e + om +... ), Suppose that one wants 
to choose the integer ag such that P,,(ag , ag) is close to a given number 8, (say, 
68 = .95). From existing tables, one can find zo such that Go(x) = 6. Now, 
for the reasonable choice of ag as the smallest integer =2/2(n/xo)', one can 
only say that x’n/(4a’) = xz» + O(n), thus, from (11), P.(as, a3) = 6 + 
O(n). On the other hand, if n is large and ag has been chosen, (say) in the 
above manner, formula (9) will yield an excellent approximation to P,,(ag , ag). 


5. The range of cumulative sums. Let {,, f2,--- be independent random 
variables, each assuming only the values +1 and —1 with equal probability. 
Further, let R, denote the range of the cumulative sums 2, --- 


(zm = 61 + -+> + om, 2% = 0). Thus, 2, = U, + V,, where 


? Zn» 


—U, = Min (a, --: , Zn), Van = Max (2, -**, Zn). 
Note that R, , U, and V, assume only the values 0, 1, --- , n. In this section, 
by applying Lemma 4 to the exact formula (1) below, we shall obtain a com- 
plete asymptotic expansion for the distribution of R,. For each positive in- 
teger m, it yields an approximation to P(R, < r) with error smaller 
than Cn" ’, C denoting a constant independent of n and r. 
Lemma 6. We have 
(1) P(R, <r) = Aras(n) — A,(n), (r-= 1,2, --- 


where 
c-l : 

(2) An) = (1/e) > (1 — (—1)*) cot? ke/(2c) (cos kx/c)”. 
k=l 


Proor. From the definition of p,(7, 7, c) in Section 1, (replacing z, by 2, = 
a + Zn), we have, for positive integers a, b, 


P(U, <a, Van < 6,2, = 7 — a) = pp(a,j, a + D). 


a+b—1 


P(U, <a,V. <b) = >> p,(a,j,a +b). 


I 


Further, p,(a,j,a +b) = 0if7 S Oorj 2 a + B, hence, 
ics 
=1 


Moreover, for r = 1, 2, --- 


P(U, + V, <r) 


= ) {P(U, <a, Vn <r—a+t1) — P(U, <a, V, <r —a)}. 


a=1 


From U, + V, = Ra, P(Un <1, Va 0) = 0, it follows that (1) holds with 





458 J. H. B. KEMPERMAN 


A,(n) = > sai jaiPn(t, j, ¢). Using (1.2), the latter formula easily implies 
(2). 


Remark. A formula equivalent to (2) is 


AJdn) = 27°" > (") san — 2m). 
Osmsn/2 


m 


Here, f.(x) is defined by f.(0) = (ec — 1)/2, fh) = e — 2h, (h = 
e—1),f-(c) = —c+ 1, f-(se +h) = (—1)*f-(h), (8 = 1,2, --- sh = 
c). We omit the proof. 

Transforming in (2) the terms with k > c/2 to the summation variable 
k’ = ce — k, we have 


l,---, 
] 


fie o Ms 


[(e—1) /2] 
(3) A(n) = (1/e) D> {(1 — (—1)*) cot? kx/(2c) 
3) k=l 
+ (1 — (—1)°*) tan’ kw/(2c)} (cos kw/c)". 
Applying Lemma 4 to (3), one may derive the asymptotic expansion of A,.(n) 
for large n. For convenience, we shall restrict ourselves to the case that, in (3), 
n is an odd positive integer, c an even positive integer, thus, from (3), 


e/2—1 


A.(n) = (8/c) cosee’ kx/e (cos kx/c)"™ 
k=l 
k=1(mod 2) 


In view of (2.1), the latter formula may be written as 


c/2—1 
a tesa ee 
(4) A.(n — 1) = (8/c) oo s(-3 (ak*)?, _ ai’), 
k=l n rt 
k= 1(mod 2) 
(c and n even), where 
(5) a = wn/(2c) 
and 
(6) f(u, w) = cosec” w eX*) 


Here, from Lemma 1, 


oOo op 
- ue(w h —h 2 2 ; 
(7) ee = PY Aww, (lw| < w/4). 
p=0 h=O 
Further, differentiating the well-known Taylor expansion of cot z about 0, 
, . 


x 
‘ a —} y ; om 
(8) cosec w= w = Cyu’, (jo) <j, 


ve( 


where Cy = 1 and 


(9) C, = (2v — 1)2”B,/(2v)! (vy = 1,2,---) 


B, denoting the vth Bernoulli number, (B; = 1/6, --- ; C; = 1/3, C2 = 1/15, 





ASYMPTOTIC EXPANSIONS 459 


'; = 2/189, C, = 1/675, --- ). Hence, from (6), (7) and (8), for |w| < 
x /4 and arbitrary values u, 
x be 


(10) f(u,w) = w me > Byu'w, 


p=0 had 


where 


w—h 


(11) Bu = >, C, A,» > 0; 
here, By, = Ay, Buo = C, (from Ao = 1, Ay = 0 for w > 0), thus, Bo = 1, 
By = 1/3, By, = 1/12, Ba = 1/15, By == 1/20, Bu = 1/288. 

THEOREM 2. Let 


(12) G, = G,(a) = 4(a/r)* > e ™ (Qak’)’, 


kel(mod 2) 
where a is given by (5). For r = 0, 1, «++ an equivalent formula is 
(13) G, = (-1)" © (-1)'A2(kr(2e)). 


sae 
k=—x 


Then, for each positive integer m and each positive constant K, there exists a con- 
stant M > 0, not depending on c or n, such that 


m—1 M 
(14) T. = An — 1) — (8/2)' > aw? > (—1)*Bua Gera 


w= h=0 
satisfies 
(15) | Ta) Ss M(eR™ + no e*(1 + a? 4)), 


for each choice of the even positive integers n and c. 

Proor. Let n, ¢ denote even positive integers, thus, (4) holds true. Further, 
let m = 1 be a given integer, K > Oa given constant, K, a fixed constant >K. 
Without loss of generality, we may assume that n' > K, = 1. let > 0 be 
defined by an’ = K,n'. Then 


9) o 3 4 
(16) (wd/c)® = = ad® = 2n°K, S 2 < 2/4, 
7 


thus, \ < c/2. From Lemma 1, ¢(w) 2 0 for 0 S w S 2°/4, hence, from (6), 
the contribution to the right hand side of (4) of the terms with k > ) is at most 
equal to (8/c) (c/4) cosec’ ((2/n)ad*)'e™ < (x /2)((2/n)Kyn')*e*"™ = 
O(e-*"). Hence, from (4) and (5), 


(17) A.(n — 1) = O(e&*™") + (8/x)(2a/n)'S, 


ia fe: eee 
S= a — f ( —= (ak*)’, = ak’). 
1<k<h n n 


k=1(mod 2 


where 





460 J. H. B. KEMPERMAN 


We now apply Lemma 4 with 8 = a, o = —4/n, r = 2/n, p = 2, g = 1, 
s=1, 4= 4Ki, (uw > m% arbitrary), w,; = 2 < w/4 = W. Then (2. ue 
holds, from (16) and (4/n)(ad*)’ = 4Kj = wu. Moreover, ad’ = K, n' = 
K, 2 1. Hence, in view of (10), Lemma 4 yields 


(18) |\S—S,| s Ma'n(nte*" + ne "(1 + a’™*)), 


M denoting a constant independent of a and n. Here, 


m—1l 4p Pa 


(19) s =) ZZ Byn( —4/ n)*( 2/n)**- a o~iury". 


p=0 h=0 k=l 
k=1(mod 2) 


Thus, if G, is defined by (12), (15) is an immediate consequence of (17), (18) 
and (19). That (12) and (13) are equivalent for r = 0, 1, --- , follows by sub- 
tracting the asserted relation of Lemma 5 with x = z from that with x = 0. 
Remark. In view of (3) and tan’ w= wt w/s+-:-: , it is easily seen 
that, for m S 2, the estimate (15) holds for all positive integers n and c. 
Let us iutroduce the distribution function 
(20) F,(r) = P(R, < r) + P(R, = r)/2 
and the quasi-frequency function 


(21) fa(r) = P(R, = r — 1)/4 + P(R, = r)/2 + P(R, = r+ 1)/4 


From (1), 


(22) 2Faai(c) = Acao(n — 1) — A-(n — 1) 
and 
(23) Afni(e +1) = Acys(n — 1) — 2Acge(n — 1) + A-(n — 1). 


Hence, applying Theorem 2, one obtains an asymptotic development for the 
quantities F,,(c) and f,..(¢ + 1), ¢ and n denoting even positive integers. 
In order to simplify these expansions, we introduce 


2 
(24) vila) = >, (—1)"BuaGysra(a), 
h=0 
(wu = 0, 1, ---), where G,(a) is defined by (12) or (13), (the latter only for 
r = 0), @ ranging through the positive real da From Theorem 2, for 
each integer m 2 1, 


m—1 
(25) Adn — 1) = (8/x)* DS y,(9'n/(2€) )n* + O(n-™"*4) 
p=0 


if n and ¢ are even positive integers, the remainder holding uniformly in c. From 
(12) and Lemma 3, for each integer r, 


(26) G,(a) = O(e-*(1 + a’*)), (a > 0). 





ASYMPTOTIC EXPANSIONS 


Moreover, from (12), 
(27) Or —(2a)"(Gra — (Or +.1)G,). 
da 
Hence, letting a = 2'n/(2c’) and D = 8/ac, 
(28) DG Aa) = ¢' (Gra, — (2r + 1)G,), 
thus, 
(29) DG, = € (Grae — (4r + 5)Graa + (2r + 1)(2r + 2)G,). 


In general, 


(30) D'G (a) = c* >. dw (r)Gp40(a), 
v=xQ 


(a = n/(2c’), s = 0, 1, ---), where the a,(r) are certain constants, inde- 
pendent of n and c, which may be computed from the recursion relation a,,(r) = 
Qs-1-1(7) — (2r + 2vy + 8)a-4, (r), (ae(r) = 0 if » < 0 or vy > 8). 

It follows from (24) and (30), that 


(31) D'y,(9'n/(2e?)) = Yye(a'n/(2c*) )n*”, (s = 0,1,--- 


where 


“ 2 
(32)  Yps(a) = (2a/x’)*? > >> (—1)"Byraa (ue + Rh — 1)Gyangrr(ae). 


h=0 v=0 
Here, the functions 7,,(a) are explicitly known, for instance, from Bo = 1, 
(12) and (28), 


(33) yo(a) = (2/r)* > 8 *?* (2a + (2k + 1)7*). 
k=0 


Further, from (13) and (29), 


> 


(34) yoa(ax) ee 2 - (-1)""e™ 2 (6a) )2 
k=l 


Observe that, from (26) and (32), y,.(a@) is a bounded function of a, a > 0, 
whenever s 2 1. Hence, from (31), letting 


f.(n, c) = vu(«'n/(2c’)), 


we have, for each positive integer g, and A > 0, 


q—1 
(35) f,(n,e + A) —f,(n,c) = Dd, nye 9'n/(2c*) ) A"/s! + O(A'n™”), 
s=l 
the remainder holding uniformly in c. Finally, letting g = 2m — 2y, (22), (23) 
and (25) easily imply the following result. 





462 J. H. B. KEMPERMAN 


TueoreM 3. Let F,(r), f.(r) be defined by (20) and (21). Then, for each posi- 
tive integer m, there exists a constant M, independent of n and c, such that 
m—1 2m—2y—1 


2¢ 7 9 /.. —(2e+s—1)/2 2 s/e.2\\08 | —m+4 
(36) |Faa(e) — (2/r)' S SO wt) (w'n/(2c*))2°/s!| < Mn", 


p=O g==1 
and 
} m—2 2m—2yp—1 : : F | 
an faaile+1)—(2/x)' DS DO nt (an /(2c*))2°(2"* — 1)/s!| 
37 | p=0 a=] 
< Mn-™* 


for each choice of the even positive integers n and c, where yys(a) is defined by (32). 
Here, for each p, s 2 1, Yus(a) 7s a bounded function of a, a > 0. 

Note that, from the remark following the proof of Theorem 2, (36) and (37) 
hold for each choice of the positive integers n and c, provided m < 2. From (36), 
applied with m = 2, we have F,4(c) = (8/2) * (yo (a) + n~yo2( a) ao 
n'(yn(a) + Qyos(a)/3) + O(n), with a = 2'n/(2c’), especially, from (33), 


(38) F,-1(c) “ (8 s°) de r2(2k+1)2n 28 nn ie 4 (2k + 1)~*) + O(n). 
k=0 


Further, from (37), applied with m = 2, 
Juil(e + 1) = (8 x)*(nyo2( a) _ Qn y03( a) ) + 0 n~), 


with a = 2 n/(2c’), especially, from (34), 
x 


‘ ‘ . ( - c)?/(2n))2 —4 
(39) faas(c — 1) = 8(2en)* >> (—1)**e“ kK + O(n). 
k=1 

As was shown by Feller [2], cf. also Darling and Siegert ((1], p. 638), the slightly 

weaker result, obtained by replacing in (39) the remainder O(n™) by o0(1), 

holds whenever the {, are independently and identically distributed random 

variables, E(¢,) = 0, Var(¢,) = 1. 

REFERENCES 

{1} D. A. Daruine anv A. J. F. Stecert, “The first passage problem for a continuous 
Markov process,’’ Ann. Math. Stat., Vol. 24 (1953), pp. 624-639. 

[2] W. Feuer, “The asymptotic distribution of the range of sums of independent random 
variables,’’ Ann. Math. Stat., Vol. 22 (1951), pp. 427-432. 

[3] B. V. GnepEnKo, ‘‘Some results on the maximal deviation between two empirical dis- 
tributions,’’ Dokl. Akad. Nauk SSSR, Vol. 82 (1952), pp. 661-663. 

[4] B. V. GNeEpENKO anv V. 58. Korotyvk, ‘‘On the maximal deviation of two empirical 
distribution functions,’”’ Dokl. Akad. Nauk SSSR, Vol. 80 (1951), pp. 525-528. 

[5] B. V. GNEDENKO anp E. L. Rvaéeva, “On a problem of comparison of two empirical 
distributions,’”’ Dokl. Akad. Nauk SSSR, Vol. 82 (1952), pp. 513-516. 

\6] F. J. Massey, Jr., ‘‘The distribution of the maximum deviation between two sample 
cumulative step functions,’’ Ann. Math. Stat., Vol. 22 (1951), pp. 125-128. 





ASYMPTOTIC MINIMAX CHARACTER OF THE SAMPLE DISTRIBUTION 
FUNCTION FOR VECTOR CHANCE VARIABLES 


By J. Krerer! anp J. Wo.irowirz? 


Cornell University 


Summary. The purpose of this paper is to prove Theorem 1 stated in Section 1 
below and Theorem 2 of Section 6 and the results of Section 7. These theorems 
are the generalizations to vector chance variables of Theorems 4 and 5 and 
Section 6 of [1], and state that the sample distribution function (d.f.) is asymp- 
totically minimax for the large class of weight functions of the type described 
below. The main difficulties are embodied in the proof of Theorem 1 (Sections 
2 to 5), where the loss function is a function of the maximum difference between 
estimated and true d.f. The proof utilizes the results of [2] and is not a straight- 
forward extension of the result of [1], because the sample d.f. is no longer “dis- 
tribution free’ (even in the limit), and hence it is necessary to prove the uni- 
formity of approach, to its limit, of the d.f. of the normalized maximum deviation 
between sample and population df.’s (for a certain class of d.f.’s). The latter 
fact enables us essentially to infer the existence of a uniformly (with the sample 
number) approximately least favorable (to the statistician) df., by means of 
which the proof of the theorem is achieved. Theorem 2 (Section 6) considers 
loss functions of integral type, and more general loss functions are treated in 
Section 7. 


1. Introduction and preliminaries. The problem of finding a reasonable 
estimator of an unknown distribution function (d.f.) F in one or more dimen- 
sions is an old one. In the one-dimensional case the first extensive optimality 
results were obtained in [1]. It was shown there that, although a minimax pro- 
cedure for sample size n may depend on the weight function as well as on n, 
the sample d.f. ¢% is asymptotically minimax as n > ~ for a very large class of 
weight functions which includes almost any weight function of practical interest. 
Also, an exact minimax procedure is extremely tedious to calculate in most 
practical cases, and is less convenient to use in practice than is ¢% . Moreover, 
one can obtain from [1] a bound on the relative difference between the maximum 
losses which can be encountered from using 4 or the actual minimax procedure, 
and for many common weight functions this bound indicates that ¢% is very 
close to being minimax for fairly small values of n. 

For dimension m > 1 the minimax problem presents difficulties which are 
not present when m = 1. (An outline of the main ideas and difficulties encoun- 
tered in the proofs when m = 1 or when m > 1 will be given in Section 4; the 


Received April 21, 1958. 

1 The research of this author was under contract with the Office of Naval Research. 

2 The research of this author was supported by the U. 8. Air Force under Contract No. 
AF 18(600)-685, monitored by the Office of Scientific Research. 


463 





464 J. KIEFER AND J. WOLFOWITZ 


proof there is completed in Section 5; additional considerations for various weight 
functions are outlined in Sections 6 and 7.) These difficulties stem from the fact 
that neither ¢% nor any other known procedure which seems a reasonable candi- 
date for optimality, has the distribution-free property possessed by ¢% when 
m = 1. This fact has led investigators of the problem when m > 1 to try (un- 
successfully) to find reasonable distribution-free procedures. Such investigations 
now seem to have been aimed in the wrong direction; for the main result of the 
present paper is that ¢% is still asymptotically minimax for a large class of weight 
functions, even though it is no longer distribution free. 

The proof of the result just stated presents new difficulties far greater than 
those encountered when m = 1. In order to describe these difficulties briefly, let 
us suppose for the moment that the risk function is the expected value (under 
the true F) of n'” times the maximum absolute deviation between estimated 
and true d.f. The computation of this risk function or its limit as n — © for the 
sequence of procedures ¢% (or any other reasonable sequence procedures) is 
known to present formidable difficulties, even for very simple continuous F 
(e.g., the uniform distribution on the unit square when m = 2). Our method 
of proof circumvents such a computation by showing that, when n is suitably 
large, the risk function of ¢% is changed arbitrarily little from what it would be 
if the maximum deviation were taken over a large but finite set of points instead 
of over all of m-space (this uses a result of [2]). Thus, the problem is reduced 
to a multinomial problem, similar to the reduction of [1] when m = 1, and we 
can circumvent the explicit computation of the risk there in a manner like that 
used in the multinomial casein [1], and which will be described in Section 3 below. 
But there remains another difficulty: in order to use a Bayes technique like that 
of [1] to prove the asymptotic minimax character of ¢2, we must show that there 
is a d.f. F; at which the risk function of ¢% is almost a maximum for all sufficiently 
large n;i.e., that the location of some approximate maximum does not “wander 
around” too much with n. Because of the distribution-free nature of the chance 
loss (for many common loss functions) under ¢, when m = 1, the existence of 
such an F; was automatic there (any continuous d.f. could be used) ; for m > 1, 
our proof requires the result of Lemma 1 of Section 2 below to obtain the existence 
of such an F; , at least when F is restricted to belong to a class of d.f.’s which in 
Section 5 is seen to be dense enough in an appropriate sense to yield the desired 
result. Once such an F; is known to exist, a sequence of approximately least 
favorable a priori distributions can be constructed for the approximating multi- 
nomial problem in the manner of [1]; this will be described in Section 4. 

Aside from the difficulties described in the previous paragraph, the proofs of 
minimax results when m > 1 are very similar to those when m = 1. Therefore, 
rather than to repeat all of the details of [1], in each of Sections 4, 6, and 7 we 
will first describe the idea of the proof and then will indicate the modifications 
needed in the proof of the corresponding section of [1] to make it apply when 
m> I. 

We now give the notation used in this paper. m will denote any positive integer, 





SAMPLE DISTRIBUTION FUNCTION 465 


fixed throughout the sequel. § denotes the class of all d.f.’s on Euclidean m-space 
R™, and §° denotes the subclass of continuous members of 5. Let D be any sub- 
class of the space of real functions on R”™. For simplicity we assume $ C D, 
although it is really only necessary that D contains every possible function of the 
form S,, (defined below), for all n and z‘”’. Let B be the smallest Borel field on D 
such that every element of F is an element of B and such that, for every positive 
integer k, real numbers a,,---, a, and m-vectors 4,,---, t&, the set 
lg\geD; g(t) < am,---, g(t) < a} is in B. (For example, we might have 
D = § and B the Borel sets of the usual metric topology.) Let D, be the class 
of all real functions ¢, on B X R”” such that ¢,(- ;z) is a probability measure 
(B) on D for each z in R”” and such that ¢,(A ;- ) is a Borel-measurable function 
on R”" for each A in B. 

We now describe the statistical problem. Let Z,, --- , Z, be independently 
and identically distributed m-vectors, each distributed according to some df. 
F about which it is known only that F ¢ F (or & or some other suitably dense 
subelass of ¥). The statistician wants to estimate F. Write Z*"’ = (Z,, -+- , Zn) 
and 2” = (2, -+- , Zn), where z; e R”. Having observed Z™ = z‘”’, the statis- 
tician uses some decision function ¢, (a member of ©, ) as follows: a function 
g ¢ D is selected by means of a randomization according to the probability meas- 
ure ¢,(- 32°") on D; the function g so selected (which need not even be a member 
of ¥) is then the statistician’s estimate of the unknown F. It is desirable to 
select a procedure ¢, which may be expected to yield a g which will lie close to 
the true F, whatever it may be; the precise meaning of “‘close’’ will be reflected 
by a weight function W,(F, g) which measures the loss when F is the true dis- 
tribution function and g is the estimate of it. The probability of making a deci- 
sion in A when ¢, is used and F is the true d.f, is 


(1.1) ur,o,(A) = | oca, 2” )F (dz), 


which, as a function of A, will be a probability measure on D (see the next para- 
graph). Denoting expectation of a function on D with respect to this measure by 
Er, (the symbol P,,., is used analogously, and the subscript ¢, will be omitted 
when it is not relevant), the risk function of the procedure ¢, is defined by 


3 ) Tr\ F, dn )= Ev .4,W n( F, g ) > 


. — ‘ , 

i.e., it is the expected loss when F is true and ¢, is used. A sequence {@,} of pro- 
cedures is said to be asymptotically minimax relative to a sequence W,, of weight 
functions and a subclass S$’ of & if 


sup ra(F, bn) 
(1.3) he eee ~=] 
neo inf supr,(F, on) 
ontDn Fes’ 
(We note that this is a stronger property than that obtained by suppressing 
the supreme operation in the numerator and asking that the upper limit as 








466 J. KIEFER AND J. WOLFOWITZ 


n— © be S1 for each F; this latter asymptotic property is much easier to 
verify than (1.3).) A nonrandomized decision function is one which for each 
z'" assigns probability one to a single element (depending on 2”) of D. By 
@» we denote the nonrandomized procedure which chooses as decision the “sample 
df.” S, defined by 


S,(z) = n“* (number of Z; < z, 1 


lA 


t 


IA 


n), 


component of z. We shall not explicitly display the dependence of the chance 
function S, on Z‘”’. 

Obvious measurability considerations arise in connection with (1.1), (1.2), 
etc. These are handled exactly as in Section 1 of {1}. 

We can now state the main result of this paper, whose procf will occupy the 
next four sections (modifications and extensions are considered in Sections 
6 and 7). 

TueoreM 1. Suppose W,(F, g) = W(n'” sup,|g(z) — F(z)|), where, for 
r 2 0, W(r) is continuous, nonnegative, monotonically nondecreasing, not identically 
zero, and satisfies 


(1.4) I rW(r)e"” dr < x 
0 


where as usual Z S z means that each component of Z is S the corresponding 


where Cm is given by (1.8). Then {p%} is asymptotically minimax relative to |W, 
and §. 
Before listing the results of [2] which will be used in the present paper, we 
introduce some additional notation. When Z, has d.f. Ff, define 
D, = sup |S,(2) — F(x)! 
zerR™ 


and 
G.(r;F) = PelDa < r/n”}. 


For k a positive integer, write Ay for the subset of (k + 1)” points in the m- 
dimensional unit cube 7” = {x|0 < x < 1, 2 ¢ R"} for which each coordinate 
is an integral multiple of 1/k. Write 


Dax = sup |S,(2) — F(z)! 


zea 


and 
f r 
Ga - iP = Pp i% : \ 
7 u(r; F) Pri k< nif 


We also write 


Gou(r; F) = lim Gax(r; F); 


nen 


the existence of this limit follows from the multivariate central limit theorem. 
Finally, let $* be the class of d.f.’s F which are in $° and for which each one- 





SAMPLE DISTRIBUTION FUNCTION 467 


dimensional marginal d.f. of F is uniform on I’. Clearly, if Z, has d.f. F in 5°, 
we can perform continuous transformations on the components of Z;, so as to 
make the result have a d.f. F* in $*, without changing G, . This fact will be 
used in the sequel. 

The results of [2] which will be used in the present paper are the following 
(some of these results hold with little or no modification for F inF, but we need 
them here only for F in $*): 

A. (Theorem 2 of [2].) For F in $*, there is a d.f. G(- ;F) such that 
(1.5) lim G,(r; F) = G(r; F) 


now 
at every continuity point of the latter. Moreover, for F in 5*, 


(1.6) lim G..(r; F) = G(r; F) 
k+>x 
and (obviously ) 
(1.7) lim Gaxr(r; F) = G(r; F). 
kw 
B. (Theorem 1 of (2].) There are positive constants c% and c,, (independent of 
n, F, and r) such that, for F in ¥*, all n, and all r 2 0, 


(1.8) 1 — G(r; F) < che”, 

Further remarks on possible values of Cm are contained in {1] and [2}. 
C. For each F in & there is an F; in $* such that, for all n and r, 

(1.9) G(r; Fi) S G(r; F). 

(This is fairly obvious; see [2] for further discussion. ) 


Of course, (1.8) and (1.9) also hold in the limit; i.e., with the subscript n 
deleted. 


D. (A consequence of (3.11) of [2].) For all F in*, and for each d > 0, 


ak 


(1.10) Gaa(r; PF) — G(r +d; F) < = 


+ c.k exp {—c,d°k"?} + ¢ ot +), 
a\k on 


where the c; are positive constants depending only on m. 

A further result of [2] will be given in Lemma 2 of Section 2, after some addi- 
tional notation has been introduced. 

In most of the arguments of this paper we will be dealing with F’s which are 
in $°. To simplify the discussion in such cases, we shall always assume that, for 
every real number / and integer j, at most one Z; has its jth coordinate equal 
to t. The probability that this be not so is zero. 


2. Uniformity of approach of G, to G in the subclass $,. The purpose of this 
section is to prove Lemma | (stated below), which will be used in Section 4 to 
prove the existence of an F; with the properties described in Section 1, when F 
is restricted to belong to a suitable subclass $, of ¥. This and the multinomial 





468 J. KIEFER AND J. WOLFOWITZ 


result of Section 3 will then be used in Section 4 to demonstrate Theorem 1 
with § replaced by $.. The proof of Theorem 1 is then completed in Section 5 
by showing that 5, is suitably dense in $ as « — 0. Thus, although by far the 
greatest amount of new effort needed to prove Theorem 1 when m > 1 over what 
is needed when m = 1, is contained in the arguments of the present section, the 
reader who is interested mainly in the ideas of the statistical proof may read the 
statement of Lemma | and then go on to Section 3. 

We first introduce some notation which will be used in this and subsequent 
sections. Let « be a small positive number and let r be a positive number, both 
of which will be fixed in the present section. Other ¢’s with subscripts will be 
used in this paper to denote positive variables which will approach zero. The 
symbol o(1 | ¢;) is to denote a quantity which, as e; approaches zero, approaches 
zero uniformly in all other relevant quantities. Sometimes the latter will be 
explicitly indicated. Thus o(1 | «; |, F) denotes a quantity which approaches 
zero, uniformly for all n (sometimes for all large n) and for all F (either in F 
or in some indicated subclass), as ¢; —> 0. The symbol o(1 | ¢;, n | F) denotes a 
quantity which approaches zero as ¢; — 0, n — ~, uniformly in F (either in F 
or some indicated subclass). The symbol o(1 | n | F) denotes a quantity which 
approaches zero as n — ©, uniformly in F (either in S or some indicated sub- 
class). The symbol o(1 | d, N(d) |- ,-) is to mean a quantity which approaches 
zero as d — 0 while n stays larger than a suitable function N(d) of d (which may 
change in various appearances of the symbol, although we shall sometimes use 
N, N’, ete., to denote several such symbols which arise in the proof of the same 
lemma), and the approach of this quantity to zero is uniform in all other relevant 
quantities, which may be indicated where the dots are. The symbols 
o(1 |e, Ne) |-,-) and o(1 | k, N(k) |-,-) (with k — «) will be used simi- 
larly. Finally the symbol @ will always denote a generic quantity <1 in absolute 
value; two 6’s in different places need not be the same. The quantity d will 
always be >0O. 

Let 5, be the subclass of those d.f.’s F in §* which have a Lebesgue density 
fr in the subset of all points in 7” where at least one coordinate is 21 — e, and 
such that 4 S fr S 2 almost everywhere in this region. The proofs of this section 
actually hold when §, is replaced by a somewhat larger class; but this is of little 
importance, the main use of Lemma 1 being to prove Theorem 1. (The relation- 
ship of F. to F, will be stated in Section 4.) 

LemMA 1. We have, for each fixed m, 


(2.1) IG,(r; F) — G(r; F)| = o(1|n| F &€§,). 


The proof of Lemma 1 will require several supplementary lemmas. The proofs 
for all m > 1 are essentially the same, but the proof is most easily written out 
and followed in the case m = 2. Hence, throughout the remainder of this section 
we shall carry out all proofs in the case m = 2. The modifications in the statements 
and proofs which are necessary when m > 2 will usually be completely obvious; 





SAMPLE DISTRIBUTION FUNCTION 469 


and we shall explicitly mention, at appropriate points in the argument, those 
modifications which are not completely obvious. 

Thus, we can write in coordinates Z, = (X,, Y,) andz = (z, y), throughout 
the remainder of the section. (In most of the corresponding arguments for the 
case of m components, x will stand for the first m — 1 components of z, and y 
will stand for the last component of z.) 

The idea of the proof of Lemma 1 is that (1.10) should somehow be used to 
prove Lemma 5, which, by a suitable uniformity result (Lemma 7) on the ap- 
proach of the multinomial distribution to its limit, will yield (2.1). What is 
needed to obtain Lemma 5 from (1.10) is Lemma 4, the idea of which is that if 
n'*|S,(z) — F(z)| attains the value r somewhere, then it is very likely to attain 
the value r + d somewhere, if d is small; it is the structure of $, which is used, 
in (2.18), to prove this. 

For 0 < « < 1 we défine the events 
(2.2) Lia) ={ sup |S,(z) — F(z)| 2 r/n",, 

veya 
and 
(2.3) Lia) ={ sup {S,(z) — F(z)| = r/n'’}. 
l—e,; S751 
1-e,; Sys 
(For the case of vectors with m components, the supremum in (2.2) is taken over 
the set where at least one of the m — 1 components of x is Se ; in (2.3), it is 
taken over the set where all m components are 21 — « .) 
The next two lemmas lead up to Lemma 4. 
Lemma 2. We have 


(2.4) Pr\Li(a)} = o(1 lea, n | F € F*). 
and 


(2.5) Pr{Le(a)} = o(1la,n|F €S*). 

Proor. An upper bound on the probability of L1(4) can be obtained from 
equations (3.6), (3.9), and (3.10) of [2], if, in the latter, we set h = 0,7 = 1, 
k = 1/e (the relevant argument of [2] is valid even if k is not an integer), d = r. 
We obtain 


: ] 2 1/2 
Pr{Li(a)} < ; +c, exp {—cr/2e"} 
(2.6) ' 


7 ti 
$s - (36 + “) = o(lle,n|Fes*). 
r n 


We shall now use an argument like that by which (2.22) of [2] was proved, in 
order to prove (2.5). The event L2(«,) implies the occurrence of at least one of 
the following events: 





470 J. KIEFER AND J. WOLFOWITZ 


1—e,;5251 


L} = { sup | (number of Z,, ---, Z, which satisfy 
1—e, 351 


2 
0s X;8 2,y S Yi S 1) — expected number| = sl, 


L; = sup | (number of Z,, --- , Z, which satisfy 
(2.7) 0303 1/2 
xs X;85 1,y S Y: S 1) — expected number| 2 7 
L; = { sup | (number of Z,, --- , Z, which satisfy 
Sgt c 
zs X;5 1,08 Y; Ss y) — expected number| 2 5 \. 


The random variables in the original sequence {Z,} all have the same distribu- 
tion as Z; = (X,, Y,). Apply the argument by which (2.4) was obtained for 
sequences all of whose members have the same distribution as each of the follow- 
ing, in order: (1 — Y;, X:), (1 — X1,1 — Y:), and (1 — X,, Yi). We obtain 
that 


(2.8) P,-{L3} = o(1la,n|Fes*), i = 1, 2,3. 


Hence (2.5) is verified. 
Define the events 





(2.9) L;(a) = { sup |S,(2,1) — zl < Lat 
0s2<1 en 
and 
5 r | 
(2.10) Lda) = sup |S,(z) — F(z)| 2 aT: 
\ ses 7 = 
Syvsl—¢) 


From (1.8) we obtain that 
(2.11) P,{L3(4)} =]-—- a(1 |e n, F e $*). 


Write L(g) = L3(a)M La(a). Whenever L(«) occurs we can define chance 
variables H and T as follows: H = ha Sh S l,andT =t,0<ts1-— 4, if 


(2.12) \S,(h, t) — F(h, t)| = r/n’” 
and 
(2.13) S,(h’, U) — F(h’, t')| < r/n'? 


fora Sh’ 21,0 5 0 <t, as wellasfore sh’ < hit’ = t. (In the m-com- 
ponent case, h’ has all m — 1 components 2 « and h can be specified by any 
rule which does not depend on y for y > ¢t, and such that (2.12) holds.) Thus, 
if a horizontal line y = ?¢’ is swept upward starting at t’ = 0, the line y = tis 





SAMPLE DISTRIBUTION FUNCTION 471 


the first for which (2.12) can hold, and A is a well-defined value such that it 
does. ; 


Lema 3. We have, for some N(d) and « = da‘, 


Prd sup |S,(H,y) — F(H,y)| rtd La)} 
(2.14) 1 


T<ys nil? 


= 1— o(1|d,N(d)|Fe§.). 
Proor. We suppose that 

(2.15) S,.(H, T) — F(H, T) = r/n'? 
and we will prove that, conditional on L(«) occurring, the probability that 
(2.16) S,.(H, y) — F(A, y) 2 (r + d)/n'” 
forsome y, T < y S 1, is1 — o(1 | d, N(d) | F e,). This will be enough to prove 
(2.14), for (a) if the left member of (2.15) is greater than r/n’” the result we 
want to prove is a fortiori true, and (b) if the left member of (2.15) is < —r/n’”, 
it is proved, in the same way as below, that the probability (conditional on 
L(«) occurring) that the left member of (2.16) be <—(r + d)/n'” for some 


y,T <y S1,is1 — o(1|d, N(d)|F €§,). 
Define 


mh = n(S,(H, 1) tas S,(H, T)), 
7 ._ F(H,y) — F(H,T) 
9 SS anaemia penseneernmamenasant: 
(2.17) y H—FU,T) ’ 
no(7) = n(S,(H, y) - S,(HA, T)). 
From (2.9), (2.10), (2.15), and the definition of F, , we have, in L(«), if a < «, 


1/2 
(2.18) mm = n(H — F(H,T)) + A rn? > neéi/2 — na(h + r), 
€i € 

which goes to ~ asn — ~ (uniformly in H and T S 1 — «), and is thus arbi- 
trarily large for n > some N’(«¢,). Using (2.15) we find that (2.16) is equivalent 
to 

a - . 3 1/2 
(2.19) nmi(y) _ ng(H — FCH,T)) > dn 

ny nN ny 

From (2.18) we obtain that the probability that (2.19) occur for some 7, 
0 s 9 <= 1, is 2 the probability that, for some 9, 


- € —1 —2. 
2.20) mG) _ > 2de + 4a 9 
oo _ 1/2 172? 
my nN; ny’ 


y 
provided that « is small enough and n > a suitable Ni(«). 
Now set 


(2.21) « = a" 





472 J. KIEFER AND J. WOLFOWITZ 


and suppose that d is small enough that (2.20) holds when « is given by (2.21). 

Let {W(t),0 < t < «} be the separable (Wiener) process with independent, 
normally distributed increments, W(o) = 0, E(W(t)) = 0, Var (W(t)) = t. 
Given H, T, and n, , the left member of (2.20) clearly is distributed as the differ- 
ence between a sample d.f. and the uniform d.f. on the one-dimensional interval 
0 Ss g S 1, when the sample d/f. is that of n; independent, uniformly distributed 
random variables. It follows from [3] and [4] and the fact that n, ~ ~ asn— 
that, under (2.21), the conditional probability (given that L(«) occurs) that 
(2.20) hold for some 9 approaches, uniformly in F, ,asn— ~, 


(2.22) P{W(t) = 2d** + (2d°* + 4d~”)t for some t > 0}. 
The latter is, by [4], equation (4.2), 
(2.23) exp {— 2(2d°*)(2d"* + 4d-"”)} 


which approaches one as d — 0. Hence, for d sufficiently small and n > some 

N(d), the conditional probability (given that L(«) occurs) that (2.16) holds 

for some y > T is arbitrarily close to 1, uniformly in §, . This proves (2.14). 
Lemma 4. We have, for some N(d), 

(2.24) G,(r +d; F) — G,(r; F) = o(1|d, N(d) | F eS). 

Proor. Substituting (2.21) into (2.2), (2.4), (2.9), (2.10) and (2.11) (none 
of which previously depended on d in any way), and using Lemma 3, we have 
(2.25) Peirs sup VnlS,(z) — F(z)| S$r+d} = 0(1|d,N(d)|Fes,), 

ose aaatis 
where of course the N(d) may differ from that of Lemma 3. Now, the definition 
of ¥, is such that, by interchanging the roles of x and y, we obtain, in the same 
way that (2.25) was obtained, 
(2.26) PelrS sup Vni\S.(2) — F(z)| Sr +d} = o(1|d, N(d)| Fes.) 
"SoZ ya 
(In the case of vectors with m components, there are m — 2 additional analogues 
of (2.26).) Finally, substituting (2.21) into (2.5) and combining the result with 
(2.25) and (2.26), we obtain (2.24). 

LemMA 5. We have 
(2.27) 0 s Gax(r; F) — G.(r; F) = o(1 | k, N’(k) | F € &,). 

Proor. The left side of (2.27) is trivial. Adding (1.10) and (2.24), we have 

Gix(r; F) — Ga(r; F) Ss 0(1 | d, N(d) | F € &.) 


(2.28) ED 
+ cok exp {— ce; dk} + e/d*k + ek/n + c / d'n. 


Let <’ > 0 be given arbitrarily. Let d,; > 0 be such that the first term on the 
right side of (2.28) is <¢'/3 if d = d, and n > N(d,). Let k, be such that the 





SAMPLE DISTRIBUTION FUNCTION 473 


sum of the next two terms on the right side of (2.28) is <«’/3 when d = d; and 
k > k,. Fork > k,, let N’(k) be >N(d,) and be such that the sum of the last 
two terms of (2.28) is <¢e’/3 when d = d, andn > N'(k). Then, putting d = d, , 
we have that the right side of (2.28) is <¢’ when k > k, and n > N’(k). Thus, 
Lemma 5 is proved. 

The discussion which immediately follows, as well as Lemma 6, leads up to 
the proof of Lemma 7. 

There are k’ cells into which J* is divided by the lines x = i/k, y = j/k, 
i,j = 0,1,---, k. (There are, of course, k” cells in the case of vectors with m 
components.) Number the cells as follows: The cell bounded byx = (7 — 1) /k, 
x = i/k,y = (j — 1)/k, y = j/k, is to be called the (7, j) cell. Write 


Trij = P,{Z, € cell (4, j)}. 


Write (7’,7’) Ss (i,j) if’ S 2,7’ Sj, and write (7’, 7’) < (7,7) if (7, 7’) S (4,3) 
and either i’ < j or j’ < j. Let Af be any collection of cells, For any fixed 
(io, jo) not in A, there clearly exist integers c,; (depending only on AT and 
(tio , jo)) such that we can write 
(2.29) F(to/k, jo/k) ” Z cijF (i/k, j/k) + Cij Tris , 

3) Sled 


(4,3) 8 (10.350) 
(i,j)eH 


identically in F (i.e., in the ,;;). 

Let «& > 0 be given. Call the cell (7, 7) regular if rr;; => «and (7,7) # (k, k). 
Call the cell (7, 7) singular if rr;; < « and (i,j) # (k, k). Let Ay be the col- 
lection of regular cells, let (7, jo) be singular under F, and let the c,; be as in 
(2.29). Denote a summation over the region (7, 7) S (t, jo), (7, 7) ¢ Ar by 
>-*:**- Then, clearly, . 


(2.30) \F'(to/k, jo/k) — Soc, F(i/k, j/k)| < hike, 


where h is a suitable positive function of k alone, which can be chosen so that 
(2.30) is valid for every « , every F, and every (%, jo) singular for such an F; 
here the Hy depends on the F and «& being considered, but the c;; depend on 
these quantities only through A . 

Define Qr,.(€2) to be the probability that 


|S.(i/k, j/k) — F(i/k, j/k)| < r/n*” for all (i,j) in Ar, 
(2.31) ef Sn(t/k, j/k) — F(i/k, j/k)]| < r/n'” 
for all (i, 7) #(k, k) and not in 77, . 


The proof of the next lemma is actually valid when 5* is replaced by the class 
of all d.f.’s on J’. 
Lemma 6. We have 


(2.32) Qr.n(es) — Gaa(r; F)| = 0(1| @, N(e) | Fe s*). 





474 J. KIEFER AND J. WOLFOWITZ 


Proor. Define, for (% , jo) singular, 


(2.33) U = S,(io/k, jo/k) 

and 

(2.34) V = Be. 8, (i/k, j/k). 

Let B be the event defined by 

(2.35) B = {|\(U — V) — E(U — V)| < @*/n". 


Now, U — V is just the last sum of (2.29) with wp,; replaced by n™ (number of 
Z, --+, Z, falling in cell (7, 7)). Hence, U — V has variance <h’(k)e2/n, 
where h’ is a suitable positive function of k. Thus, by Chebyshev’s inequality, 
(2.36) P,{B} > 1 — h’(k)e”. 


Of course, the definition of B depends on F, was well as on (% , jo); but, again, 
h’ can be chosen so that (2.36) holds for all F. Consider the events 


(2.37) A, = {|U — EU| 2 r/n'”, |V - EV} < Jnl 
and 
(2.38) Az < {|\U — EU| < r/n'?, |V — EV| 2 r/n'’}. 


The definition of these events also depends on F and (io , jo). Define Pr: = Pr{ Ai}, 
t = 1, 2. We are first going to show that, for all F for which (i , jo) is singular, 


(2.39) Pr, = o(1| e, N(e) | Fe S*), fort = 1, ?. 


In proving this, let W stand for U in the case ¢ = 1 and for V in the case t = 2. 
Then W is n™ times the sum of n independent, identically distributed random 
variables, each bounded in absolute value by some constant L (independent 
of F). Let o’ be the variance and §; the absolute third moment about its expected 
value of each summand (i.e., of W — EW when n = 1). Now, if o < a”, 
Chebyshev’s inequality yields Py, < r*e'*, so that (2.39) is verified in that case. 
On the other hand, if o* = «*, by (2.36) we have 


Pr, & PABN Ad + Pref BY < PBN Ad + h'(k) es” 
(2.40) s Ppir s n'*|\W — EW srt ea} + h’'(ke” 


lA 


Py{r/o Ss n'*|W — EW\/o S<r/o + | + h'(kye”. 


By the Berry-Esseen estimate (see, e.g., [5]) and the fact that 8;/e° < L/o, we 
have from (2.40) for all F for which <* = «’*, 


(2.41) Pr S h'(k)e? + ef” + oslng"™, 


where ¢; is a positive constant. Thus, (2.39) is proved. 
Lemma 6 follows at once from (2.39). 





SAMPLE DISTRIBUTION FUNCTION 475 


The proof of the next lemma is also valid when $* is replaced by the class of 
all d.f.’s on I’, 


Lemma 7. For any fixed positive integer k, we have 
(2.42) IGou(r; F) — Gaa(r; F)| = o(1|n|F e $*). 


Proor. Let «; > 0 be given arbitrarily. Choose ¢ so small and N(¢) so large 
that, for this value of « , the left side of (2.32) is S«,/4 for all F when n > N(«). 
We shall show below that, writing Qr(e) = limyz.«. Qr.n(e), we have 


(2.43) \Qr(e2) — Qr.n(e2)| < 9/2 


for n sufficiently large, uniformly in F. Hence, we shall have, for n sufficiently 
large, uniformly in F, that the left side of (2.42) is no greater than 


Gex(r; F) — Qr(«)| + |Qr(e) — Qr.n(e)| 


+ |Qr.n(e) — Gar(r; F)| < 6/4 + 6/2 + @/4 = a, 
and (2.42) will be proved. 

We shall now fix A and prove that (2.43) holds, uniformly in all F for which 
A, = H, for n sufficiently large. Since k is fixed, the number of possible choices 
of #7 is finite, so that Lemma 7 will be proved. 

Consider the joint distribution of the n'’?(S,(i/k, j/k) — F(i/k, j/k)) for all 
regular (i, 7), which, as n — ©, approaches a multivariate normal distribution. 
Since rr;; 2 & for any regular point it follows that the determinant of the covari- 
ance matrix of the n'*(S,(i/k, j/k) — F(i/k,j/k)) (for regular (7, 7) ) is bounded 
away from 0 (and, of course, from « as well) by a function of « , uniformly in 
all F for which Hy = H. It follows from [6], page 121, that the maximum of the 
absolute value of the difference between the joint d.f. of these n“”*(S,(i/k, j/k) — 
F(i/k,j/k)) and their limiting multivariate normal d.f. is less than n**M(e), 
where M is a real function of ¢ only. The maximum of the density of this limiting 
normal d.f. is a real function only of « , say M’(«). Thus, the statements in the 
last two sentences are uniform in all F for which Hy = A. 

It follows from (1.8) that the probability of a sufficiently large cube C in the 
space of the n'?(S,(i/k, j/k) — F(i/k, j/k)) (for all regular (i, j7)) which 
is centered at the origin, is greater than 1 — «,/12 uniformly in F and n. Hence 
this is also true of the limiting multivariate normal d.f. of the n’’(S,(i/k, j/k) — 
F(i/k, j/k)). 

Consider the region R in the space of these n'?(S,(i/k, j/k) — F(i/k, j/k)), 
which is defined by (2.31) and whose probability is Qr,.(«). The region RN C 
is a bounded polyhedron and can be approximated from within by a finite 
union R, of “rectangles” with sides parallel to the coordinate planes, such that 
the volume of the region R. = [((RNM C) — R,] is <a, where «& > O is such 
that «,M’(e) < «/12. The set R, can be covered by a finite union R; of rectangles 
with sides parallel to the coordinate planes whose total volume is <2. Let m; 
be the number of rectangles in R; , and m; be the number of rectangles in R, . 


(2.44) 





476 J. KIEFER AND J. WOLFOWITZ 


The probability of R; according to the limiting normal d.f. is less than 
(2.45) 26,M’(e) < 6/6. 


The probability P,{R.} of the region R;, according to F is <P»r{R;}, which, by 
the aforementioned result of Bergstrom [6], differs from the probability of Rs 
according to the limiting normal d.f. by less than 4m;,;M(e)n~'*. Hence, 


(2.46) P,{R,} < & /6 + 4m;M(e)n*”. 


Also, by Bergstrom’s result just cited, the probability of R, according to the 
limiting normal d.f. differs from Py{R,} by less than 4m,M(e)n™”*. Since the 
sum of this and the second term in the right member of (2.46) can be made 
less than «/6 by making n sufficiently large, it follows from the present para- 
graph and the previous two paragraphs that (2.43) holds for n sufficiently large, 
uniformly in all F for which Hy = A. This completes the proof of Lemma 7. 

Proor or Lemma 1. Let « > 0 be chosen arbitrarily. Choose k’ such that the 
right side of (2.27) is <e6/3 for k = k’ and n > N’(k’). In particular, 
0 Ss Gow (7; F) — G(r; F) S 6/3. Choose N to be >N’(k’) and such that, for 
k = k’, the left member of (2.42) is <«6/3 for n > N’. Then, for n > N’ and 
all F in F,, we have 


IG,(r; F) — G(r; F)| < |Ga(r; F) — Gaae(r; F)| 
+ |Gie(r; PF) — Gow (r; F)| + |\Goxe(r; F) — G(r; F)| < 6. 


(2.47) 


Since «, was arbitrary, Lemma 1 is proved. 


3. The multinomial result. We have mentioned in Section 1 that the main 
results of this paper are obtained by approximating the original problem by an 
appropriate multinomial problem. In the present section we summarize the 
needed multinomial results which were obtained in [1], and sketch the ideas 
of the proofs, unencumbered by the tedious details of [1]. Actually, we do not 
need the full strength of the results of [1], which are broader than those of 
Lemma 8 below in that, in the derivation of Section 3 of [1], the calculations were 
carried out in fine detail in order to obtain an error term which can be used 
to calculate an upper bound on the departure of ¢. from minimax character (in 
view of the lack of knowledge about the distribution of D, , it seems more diffi- 
cult to obtain a useful bound of this kind when m > 1). In fact, if one does not 
bother to obtain an error term, it is obvious how to shorten considerably the 
proof of the multinomial result in Section 3 of [1], and we shall see that this 
simple multinomial result without error term rests mainly on a result of v. 
Mises ({7], especially pages 84-86) which is almost forty years old. 

We now introduce the needed notation. Let h be a positive integer and let 
B,, be the family of (h + 1)-vectors r = {p;, 1 S 7 S$ h + 1} with real com- 
ponents satisfying p; 2 0, = pi = 1. Let By be a specified subset of B,- By can 
actually be fairly arbitrary in structure; to avoid trivial circumlocutions, we 
shall suppose in this section that By is the closure of an h-dimensional open 





SAMPLE DISTRIBUTION FUNCTION 477 


subset of B, , although it will be obvious that Lemmas 8, 9, and 10 hold much 
more generally. Let 7” = {T$", 1 < i S h + 1}, a vector of h + 1 chance 
variables, have a multinomial probability function arising from n observations 
with h + 1 possible outcomes, according to some x in By ; i.e., for integers 
z; 2 O with 21" >, = n, 


! 
(3.1) PAT =2:,15iSh+1} = ———— p%..- pity. 
Xy!+ ++ Tay! 


Let L be a positive integer, let y; be an (h + 1)-vector, 1 s i S L, and let 
p: = vy. (scalar product) be corresponding linear functions of r, 1 < i < L. 
To avoid trivialities, we assume at least one p; is not constant on B, . Let & be 
the class of all (possibly randomized) vector estimators of p = {p;, 1 Si S L}, 
the weight function (which depends on n) being the simple one for which the 
risk function of a procedure y, in &, is 


(3.2) 1 — Pry fld; — pd s r/n**, 1 


where r is a positive value and we have written d = {d;, 1 < 7 S L} for the 
vector of decisions. Let ¥* be the nonrandomized estimator whose ith com- 
ponent is y;7”’/n (the allowable decisions may be restricted to y'x for x in Bi 
with only trivial modifications in what follows). Finally, a point x in By; is called 
an interior point if all its components p; are positive, and if it has a neighborhood 
(in B,) which is a subset of B, . The required multinomial result is: 

Lemma 8. For any interior point x* of By, there is a sequence {\£,} of a priori 
distributions on B) converging in distribution to the distribution which gives prob- 
ability one to x* and such that {~%} is asymptotically Bayes relative to {&} asn—> ~, 
uniformly forO Ss r S R for any R < ~;2%.e., such that, uniformly in such r, 


| Pe{ loi — iT /n| > r/n'?,1 Si SL) &(dr) 
(3.3) lim — —_-- = 1. 


nap jint [Pest ~ pf > r/n¥?,1 Si SL) & (de) 
nt En 

Of course, continuity considerations show that the positive (since not all p; are 

constant) limit of the numerator of (3.3) is obtained by putting + = 2* instead 

of integrating with respect to &, , and then using the multivariate central limit 

theorem to compute the limiting probability. 

The idea of the proof of Lemma 8 is very simple. Let ['* be the (nonsingular) 
covariance matrix of the limiting A-variate normal distribution of 
n(n TS” — pt), 1S i S h, when x = 2*. Let € be a small positive value 
and let & be the uniform a priori distribution in the (solid) sphere of radius « 
about x* in B,. (€ is small enough that this sphere consists entirely of interior 
points.) According to the result [7] of v. Mises, for any x” in this sphere, with 
probability one when 7°” is distributed according to x”, the a posteriori density 
function of n'?(p; — T{"/n), 1 S i S h (calculated assuming & to be the a 
priori distribution) will tend to the h-variate normal density with means 0 and 








478 J. KIEFER AND J. WOLFOWITZ 


covarance matrix I” (corresponding to x”) as n — «. If the a posteriori density 
were really normal with the stated parameters, it would follow at once from a 
result [8] of Anderson that the a posteriori probability of the event 


(3.4) {n"\p, -—di S71 sis} 


(this probability is unity minus the a posteriori risk) is a maximum for 
d = 7'T /n, since the region (3.4) is for each d a convex symmetric (about a 
point depending on d@) subset in the space of the h variables n'*(p; — T$”/n) 
(considering the latter to be unrestricted in magnitude). Since the actual a 
posteriori density is almost normal (with high probability as n — ~ ), v* will be 
asymptotically Bayes. Finally, let &, be the just described when « = «, , where 
én goes to zero slowly enough that the above result still holds for ¥3 as n > ~. 
(For example, «¢, = n “ with 0 < a < 4. The crucial consideration is that the 
radius n’*e, of the set of possible values of n(x — x*) approach infinity with 
n, as will therefore the radius of the set of possible values of n?(¢ — T'”/n) 
w.p.1 under é, . The asymptotic problem is thus approximately one of estimating 
the mean of a multivariate normal distribution with known constant covariance 
matrix, when the mean can take on any value in an appropriate Euclidean 
space ). 

The actual proof—the precise handling of the approximations mentioned 
above, the uniformity in r, etc.—may be handled as in [1] or by complementing 
with appropriate estimates the argument of [7], but the main idea is really the 
simple one of [7]. 

The reason for wanting Lemma 8 in its stated form with the sequence {£,} 
shrinking down on 2* has to do with the problem of multinomial minimax esti- 
mation for the risk function (3.2). Let 2° be the value of x at which the positive 
limit b (as n — ) of the continuous risk function of y% is a maximum. Since 
the p, are not all constant, for any 6 > 0 there will, by continuity, be an interior 
point 2* of B;, at which the limit of the risk function of ¥% is at most (1 + 6)b. 
From Lemma 8 and the sentence following (3.3) we conclude: 

Lemma 9. {y3%} is asymptotically minimax relative to Bi and the risk function 
(3.2). 

We next consider a generalization of this result to other weight functions 
which are nondecreasing functions of max; |d; — p,|. Of course, the risk function 
is defined in the usual way. (A Bayes result analogous to Lemma 8 can be proved 
in the course of the demonstration, but we shall not bother to state it.) Let 
Co and C be positive constants such that 


—Cr2 


(3.5) P,{n'* max ¥i|T/n —zrl2r} < Ce 


for all r, all n, and all + in By, . The existence of such positive constants (which 
depend on h and the structure of B;) follows from well known results on the 
multinomial (or, in fact, the binomial) distribution; in Section 4 we shall actually 
refer to (1.8) for appropriate values of these constants. 





SAMPLE DISTRIBUTION FUNCTION 479 


Lemma 10. Let W(r) be a nondecreasing real function of r for r 2 0, not identi- 
cally zero, and satisfying 


(3.6) [ W(r)re"*" dr < @. 


Then {W%} is asymptotically minimax relative to Bi and the weight functions 
(3.7) W(x, d) = W(n” max |p; — di). 


The proof of Lemma 10 can be carried out, starting from scratch, along lines 
like those of Lemma 8. An easier proof, which was given in [1], rests upon the 
idea of reducing the proof essentially to that for the simple weight function al- 
ready considered in Lemma 8. Specifically, if the a posteriori distribution of the 
variables n'(p,; — T$"/n) were actually normal with means 0 and the appropri- 
ate covarance matrix, then d; = vi /n, 1 < i Ss L, would minimize the a 
posteriori risk; for, if this choice of the d; did not minimize the a posteriori risk 
and if H, and H, were respectively, the d.f.’s of n'’* max; |p; — d;| for the above 
choice of d; and for a better choice, we would have 


. W(r) dlHy(r) — Hx(r)] > 0, 


which is easily seen to imply that H;(r’) < H2(r’) for some r’, contradicting 
Anderson’s result cited previously (i.e., when the error terms are included, this 
contradicts the result of Lemma 8). The details of the proof are contained in 
Section 4 of [1]. 

We note that Lemma 10 exemplifies a principle which is of more general use 
in statistics: If one can verify suitable (asymptotic) Bayes results for an ap- 
propriate class of simple weight functions, the results will automatically hold 
for a general class of monotone weight functions. 

We remark that Anderson’s result can be used to prove the result of Lemma 
10 for a larger class of weight functions, namely, every function of n‘’*(d, — p;), 
1 si S L, which is symmetric about the origin and which for each real value c 
has a convex (or empty) set for the domain where the function is Sc. 


4. Proof of Theorem 1 when $ is replaced by $.. Define 5, to consist of every 
d.f. in §° which gives probability one to J” and which can be realized as the d.f. 
of Z; (say) when Z, has a d.f. in S$, and Z; is obtained from Z; by continuous 
monotonic transformations on the individual coordinate functions. Thus, 
$. > SF, , but F. includes d.f.’s which are not in 5*. Clearly, for any F’ in , there 
is an F in §, such that G,(r; F) = G,(r; F’) for all n and r. 

In this section we use the results of Sections 2 and 3 to prove the following 

Lemma 11. For m a positive integer, suppose that 


(4.1) W.(F, 9g) = W(n'” sup, |F(z) — g(z)|), 


where W(r) for r 2 0 is continuous, nonnegative, nondecreasing in r, not identically 





480 J. KIEFER AND J. WOLFOWITZ 


zero, and satisfies (1.4). Then, for each « with O < « < 1, {o2} is asymptotically 
minimax relative to {W,} and §.. 

Proor. We divide the proof into three paragraphs; « is fixed in what follows. 

1. By (1.4), (1.8), the last sentence of the first paragraph of this Section, and 
Lemma 1, the function r,(F, ¢%) approaches a bounded limit as n — ©, uni- 
formly for F in 5. (This limit is positive, by the known results in the case 
m = 1.) Hence, for any 6 > 0, there is a df. F; in &, and an integer N; such 
that 


(4.2) sup ra(F, on) < (1 + 8)ra(Fs, on) 

forn > N;. Define 

(4.3) ru(F, on) = Ev.g,W(n'” sup |F(z) — g(z))), 
aca” 

so that 

(4.4) ra(F, on) = [ W(r)d,Gnx(r; F). 


Since raz S 7, , it follows from (4.2) and the arbitrariness of 6 that Lemma 11 
will be proved if we show that 


(4.5) lim inf lim inf inf sup ra(F,¢.) 2 lim ra(Fs, os). 
kw n>o ntDn Fed, n>20 
2. Define 
(46) r= [ Wr)d Goal; Fs) 
0 
and 
(4.7) r* = I W(r)d,G(r; Fs). 
0 


Let 5. be the subset of 5. consisting of every absolutely continuous d.f. in 
; ° . . . . m 

$. which has a density function which is a constant on each of the k” open 

m-cubes of side 1/k in J” whose corners are points of Ay . From equations (1.4) 

through (1.8) and the fact that F; e« 5*, we have 


(4.8) lim ra(Fs,¢n) = ree 

and 

(4.9) lim r,(F;,¢2) = r* = limr&. 
n> kw 


Let Fs be that member of F. for which Fu(z) = Fs(z) whenever z ¢ A; . Clearly, 
for each k and n, 


(4.10) rat(Fx, On) = Tar( Fs, bn). 


From equations (4.6) through (4.10) and the fact that Fu CF, , we see that 





SAMPLE DISTRIBUTION FUNCTION 


(4.5) will be proved if we show that, for each fixed k > 1, 
(4.11) lim inf inf sup ra(F,¢n) 2 lim ra(Fu , on). 


n>o = ontDn Fed in 

Since a sufficient statistic for 5, based on Z™ is the collection T of k” real 
random variables which are equal to the number of components of Z‘” taking 
on values in each of the k” cubes just described, we may replace D, in (4.11) 
by the class D,.. of decision functions depending only on 7". But the definition 
of r,. then shows that the left side of (4.11) may be viewed as the limiting 
minimax risk associated with the problem of estimating certain linear combina- 
tions of multinomial probabilities. If we put h + 1 = k” in Section 3 and think 
of the p,; as being assigned to the k” cubes and think of the L = (k + 1)” 
quantities p; as being the values of the unknown df. at the (k + 1)” points in 
At , then the left side of (4.11) without the limit in n may be identified with the 
minimax risk for a multinomial problem with the setup of Lemma 10. (We 
shall discuss B, and the C of (3.5) in the next paragraph. ) 

3. Fix k > 1. For any F in $., , let xp be the associated multinomial proba- 
bility vector whose components are the p; described in the previous paragraph. 
Let By be the set of all such zp in Fe. . From the definitions of 5, and $. it is 
clear that By is a closed convex h-dimensional subset of the h-dimensional set 
B, , and thus satisfies the requirements of Lemma 10. For the p; defined in the 
previous paragraph, we can clearly take the C and C> of (3.5) to be the Cm and 
cz, of (1.8). Hence, from Lemma 10, for each k, we have for the multinomial 
problem of Section 3 where B; and the p; are as described above and the function 
W is that given in the statement of Lemma 11, 


(4.12) lim inf sup r’(x,¥,) = lim sup r’(x, ¥2), 
; r 


N72 PntE&n FEB), n>eo weB), 


where we have written r’ for the risk function in the multinomial problem. 
Since r’(xr, ¥2) = rar(F, on) and since the left sides of (4.11) and (4.12) are 
equal because of the correspondence of &, to D,., of r’ to r, and of B, to F. : 
we see that (4.11) follows from (4.12). Thus, Lemma 11 is proved. 


5. Completion of the proof of Theorem 1; passage to the limit with «. We now 
complete the proof of Theorem 1 by showing that §, is suitably dense in $* 
(and hence that 5. is suitably dense in $°) as ¢ > 0. We require two lemmas to 
do this. 

As in Section 2, the proof of the next two lemmas is very similar for all m > 1, 
but is most briefly written out when m = 2. For simplicity of presentation, we 
shall therefore again write out the details only in the case m = 2, and shall state 
explicitly all modifications for the case m > 2 which are not completely obvious. 

Let $’ denote the class of all d.f.’s on J° (in the general case, on J”). For 
F in’ and 0 < e < 1, define 


P(z,y) = (1-— &)F(z,y/(1-—6)), ySl-—« 


5.1) i 
F(z, y) = (1 — e)F (27,1) t+2z(y-—1lte), yol-—e. 





482 J. KIEFER AND J. WOLFOWITZ 


We shall not display the dependence on « of the bar operation defined by (5.1). 
If F ¢ $* and we perform the bar operation of (5.1) on F to obtain F and then, 
interchanging the roles of x and y, perform the bar operation on F to obtain F* 
(say), we clearly have F* ¢,. (In the case of chance vectors with m com- 
ponents, F* is obtained after m such steps.) Let Z,, --- , Z, be independent 
chance vectors with the common d.f. F, let S, be their sample (empiric) 4.f., 
and define 


D, = sup |S,(z) — F(z)|. 


zel 


Also, define m = m(n, e) to be the greatest integer Sn(1 — e). 
We now prove the following lemma: 
LemMaA 12. We have 
PD, < r/n < PADn < [r(1 + ©) + 7} / m4 
(5.2) 
+ o(1|«, N(e) |r, F eS’). 
Proor. Let C* be the event 
{|\$.(1,1 —e) —(1 — &)| < tn 
From Chebyshev’s inequality we obtain 
(5.3) PAC = 1+ o(llel|n, Fes’). 


For small e we have 
n(l — e) 14). 1/2 
(5.4) r 7 nl — «) + nite < de /n . 


Hence, when C* occurs and e is small, 


| 


n(1 — e) 














5.5 i bm shingeieencaitatmel, Og tg 
—_ Cea oe oe 
Since 

(5.6) GH SH) _ 8.(2)| 5 Gao -]| 
S,(1, 1 — e) LE 18.1, 1 — @) 

we have 

(5.7) D, = sup | — €)Sa(2) — F(z) — I —= MB hs} 
902s, |Sa(l, | — ¢) E: 9 S,(1,1 — e) 


Also we have, for y S 1 — «, 


(z) 


n 
m’ 








(5.8) Er{" nS, (1,1 —«) = mi) = F(z)/(li—© = F(z,y/(1 — &). 


Hence the conditional d.f. of the first term on the right side of (5.7), given that 
nS,(1, 1 — e) = m’, is the same as the df. of (1 — ¢)D»’ . In what follows 
define M’ = n8S,(1, 1 — e). 





SAMPLE DISTRIBUTION FUNCTION 483 


If m, and m, are two positive integers with m; < m,, we can think of S,,, as 
being obtained by adjoining (mz — m,) random vectors Z; to the set of m; random 
vectors Z; which gave rise to a corresponding realization of S,,,. Hence, 6’ de- 
noting a value with 0 < @ < 1, the corresponding values of S,,,(z) and S,.,(z) 
differ for all z by no more than 


\Sa,(2) — Se,(z)| = |r Smay(z) + (m2 — mr) _ Su,(z)| = Sm ™) | 
mM me 

Thus, in C*, where |M’ — m| < n’” é’* + 1, we have that for each possible 
value of D,, there is a corresponding set of values of Dy’ of the same probability 
(these sets corresponding to different values of D,, arising from disjoint sets in 
the space of sequences {Z,}) with Dn S Dy: + 2¢'‘ m™”, provided m > 
some M(e). 

From (5.3), (5.5), (5.7), (5.8), and the discussion of the previous paragraph, 
we have 

P3{D, < r/n™} 


< sup Py{(1 — )Dm < (r + 4e"*) / nn”) + Po{C*} 
1 


m’—m| <ni/2el/4+ 


~ 1/4 
< Pr {Dn < capri + o(1}«,N(e) |r, Fe’), 


which proves Lemma 12. 
We now prove 
Lemma 13. For W satisfying the assumptions of Theorem 1, we have 


(5.10) Sup ra(F, on) = sup ra(F, x) + 0(1| «, N’(e)). 


Proor. Define m’ = m’(n, «) to be the greatest integer <(1 — «)’n. Using 
Lemma 12 a second time (with a trivial modification since m’(n, «) may differ 
by unity from m[m(n, €), «) to go from F to F*, we have at once, for any W 
satisfying (1.4) and the other assumptions of Theorem 1, 


(5.11) ra(F*, on) = rm'(F, om’) + o(1 |e, N(e)|F € F'). 
From (5.11) and the fact that F* ¢ 5, if F ¢ $*, we have 


(5.12) sup r,(F, oa) = sup rm'(F, bm) + 0(1| €, N(e)). 
FeS* 


Pes, 


Now, as in the first part of the proof of Lemma 11, we have that r,(F, 2) 
approaches a bounded limit as n — ~, uniformly for F in §, . Hence, 


(5.13) sup r.(F,¢2) = sup rn’ (F, dm’) + 0.(1| n), 
FeS, 


Pes, 
where o,(1 | m) denotes a term which, for each ¢, goes to 0 asn — © (not neces- 
sarily uniformly in e). From (5.12), (5.13), and the fact that ¥, C F*, we obtain 


(5.14) sup rn'(F, on’) = suprm(F, on) + o(1|« N”(e)). 
FeS* 


Fed. 





484 J. KIEFER AND J. WOLFOWITZ 


Since the possible values of m’ for n > N”(e) include all integers 
>N”(e)(1 — €)* — 1 = N'(e) (say), Lemma 13 follows from (5.14). 

Lemma 14. The statement of Theorem 1 holds with S replaced by 5°. 

Proor. We have previously alluded to the fact that, if Z; has a d.f. F in &, 
then by appropriate monotonic transformations on the individual coordinates 
of Z; we can obtain a random vector Z; (say) such that Z; has df. F’ (say) in 
§* and G,(r; F’) = G,(r; F) for all r and n. Hence, 


(5.15) sup r,(F,¢.) = sup r,(F, os). 
PeS< Fes* 


Moreover, in the same way we have 


Fed, 


(5.16) sup r.(F, 2) = supr,(F, $2). 
PeF, 


Lemma 14 now follows at once from Lemma 11, (5.16), Lemma 13, (5.15), and 
the fact that 5 C 5°. 

Proor or THEOREM 1. Theorem 1 now follows immediately from Lemma 14 
and (1.9). 

We remark that the proof of Theorem 1 is clearly valid when $ is replaced 
by a suitably large subset. 

It is not really necessary to prove Theorem 1 by using (1.9) and proving 
the result first for 5° (in Lemma 14). For Lemma 13 clearly holds if in (5.10) 
we replace $* by 5’ and &, by the class of d.f.’s obtained by substituting F’ for 
¥* in the definition of F, ; one can carry through the arguments of Sections 2 
and 4 with this altered definition of , (appropriate results from [1] still hold), 
and obvious analogues of (5.15) and (5.16) then yield Theorem 1. 

In Section 7 we shall discuss various modifications of Theorem 1 obtained by 
altering the way in which W depends on F(z) — g(z). 


6. Integral weight functions. Since for m > 1 the procedure ¢% does not have 
constant risk for F in $° and any common weight functions of the form given in 
equation (5.1) of [1], there is no longer any special reason for considering weight 
functions for which the dependence on F of the integrand is of the form con- 
sidered there. Therefore, to make the proof of this section as simple as possible, 
we shall consider here the analogue of the special case of Section 5 of [1] wherein 
W(y, z) does not depend on z, relegating the consideration of more complicated 
weight functions to Section 7. Our result is 

THEOREM 2. Let W(r) be a monotonically nondecreasing nonnegative real func- 
tion of r for r 2 0 which is not identically zero and which satisfies 


(6.1) [ W(r)re*” dr < a. 
0 


Then {2} is asymptotically minimax relative to ¥° and the weight functions 


(6.2) W.(F,g) = [ Wot8\F@) — 9(z)|) dF (zx). 





SAMPLE DISTRIBUTION FUNCTION 485 


Proor. As in Section 5 of [1], the proof of this theorem is essentially easier than 
that of Theorem 1, since it is centered about the one-dimensional asymptotic 
result (6.12) (for each z). The analytic details are often like corresponding ones 
of Section 5 of [1], to which we shall consequently sometimes refer. The proof 
will be conducted in four numbered paragraphs. 

1. From (6.1) and the uniformity of approach to its continuous limit of the 
df. of n’?[S,(z) — F(z)] for all z for which } — |F(z) — 3] > 8 > O and all 
F in &° (the F-measure of this set of z approaches 1 as 6 — 0, uniformly in F), 
we conclude at once from (6.1) that r,(F, $2) has a bounded limit uniformly 
for F in $*, and thus that (4.2) is satisfied with 5. replaced by 5°, for some F; 
in &° (of course, r, is now to be computed using (6.2)). We can clearly suppose, 
and hereafter do, that F; is a d.f. on 1”. Let Sq denote the class of d.f.’s defined 
in paragraph 2 of the proof of Lemma 11, with « = 0; thus, the By of paragraph 
3 of that proof now coincides with the B, there. 

2. As in Section 5 of [1], we shall let {£.,} be a sequence of a priori probability 
measures on B, (we shall think of Fes and B, interchangeably), and we shall 
write P*{A} for the probability of an event expressed in terms of T“” = T'” 
when the latter has probability function 


PIT” = EO’, 1 sisht+1} 


ane ae (n) (n) ° 
= d(k, n, z) [I x)P,AT; = t; _it =< tu = h + 1} dégn( a); 


here P, is defined in (3.1) and f(z, 7) is the Lebesgue density at z (in J”) of the 
df. F(- ,2) in Sq corresponding to a given x in B, ; d(k,n, z) is chosen to make 
(6.3) a probability function. We take f(z, 7) to be constant on the interior of 
each of the k™ cubes in 7”; this determines (6.3) for all z with all irrational 
components (hereafter called irrational z), to which such z we may limit all 
further discussion. For each such z and possible value ¢ of T°”, we define 


(6.4) — trn(z,@, t) = | E,W(n'"\g(z) — F(z, w)|) detin(a,2, t™), 
Bh 


where, for Borel subsets B of B, , 


[ flz, r)P,{t™} déxy,( 3) 


(6.5) ten (B, z, ‘) = ’ 
[ He, #)P.At) aia) 
Bh 
we have used P,}t'"’} to denote the function of (3.1). 
se oles " ‘ , a ‘ P 
For each n and k, if F is restricted to be in Fo , we may, as in Section 4, restrict 


our consideration to procedures ¢ in D,.. Denoting expectation with respect 
to P? by E? , we have as in (5.10) of [1], 


(6.6) | rr(F,o) dt&en = / Ex tin(z,,T””) d(k, n, z) dz, 
7™ 





486 J. KIEFER AND J. WOLFOWITZ 


where dz denotes the differential element of Lebesgue measure on J”. For fixed 
n, k, z, and ¢t”, let rf,(z, t) denote the infimum of (6.4) over D,» . In order 
to prove Theorem 2, according to (6.6) and the discussion of paragraph 1 of 
this proof, it clearly suffices to show that, for some {£:n}, 


(6.7) lim lim | Etna(z,¢, T°”) d(k, n,z) dz = lim r,(Fs, $4). 


koww now YI neo 


3. Fix k. Let wx be such that F(z, mx) = F(z) for z in Ay. We may assume 
7 is an interior point of B, ; for, if + were not an interior point, letting F; = 
(1 — &)F; + &’U where U is the uniform df. on J”, we see easily that the right 
side of (6.7) can be decreased by at most a quantity which approaches 0 as 
5’ — 0 if F; is replaced by F; there; we could thus replace + by the interior 
point w corresponding to F; (for &’ small but positive) in what follows. Let 
Een , nm = 1, 2, --+ , be a sequence of a priori measures on B, which “shrink down” 
on ms as the & of Lemma 8 shrink down on z*; e.g., &. is uniform on a sphere 
of radius n~’* about x . It follows at once that 
(6.8) lim d(k, n, z) = f(z, sx) 
at all irrational z. Suppose we show that, for any irrational z and any « > 0, 
there is an N = N(e, z, k) such that, for n > N, P? assigns probability at least 
1 — eto a set of 7°" values for which 


(6.9) rin(z, T””) + € > | W(y)a(y, o(z, k)) dy, 
where g(y, ¢) = (20°) ** exp (—y’/20°) and where o(z, k) is continuous 
in z and 


(6.10) o(z, k) = F(z, wu)[l — F(z, rx)] + o(1 {| kl z) & }. 


Then, writing V (z, k) for the expression on the right side of (6.9), we will clearly 
have (from (6.10), (6.1), and the continuity of q) 


lim r,(F3,¢2) = lim V(z, k) dF,(z) 


now I™ kw 


(6.11) 


lim Viz, k) dF3(z) 


koow VT 


ll 


lim V(z, k)f(z, we) dz. 


kon “I 


Thus, an application of Fatou’s lemma to the left side of (6.7) shows that (6.11) 
and (6.8) will imply (6.7). Thus, it remains to prove (6.9) for the appropriate 
values of the arguments there. 

4. The proof of (6.9) is similar to that of Lemma 8. For fixed z, the expression 
of (6.5) is like the a posteriori probability measure of + when &, is the a priori 





SAMPLE DISTRIBUTION FUNCTION 487 


measure, except for the factor f(z, 7). In faet, by the shrinking property of 
ten aS 2 — © and the nature of f(z, 7), one obtains in the manner of [7] (see 
[1] for details) that, for any «’ > 0 and for n suitably large, with probability 
>1 — ¢ under P? , the joint density according to , of the quantities 7; = 
n'?(p; — t\"/n), 1 S i S h (where we have written x = (pi, *** , Dasi)), 
in a spherical region of probability >1 — ¢ under é&, , is at least (1 — ¢’) times 
the appropriate normal density for which the 7; have means 0, var 
Fi = pai(l — pai), cov( Fi, ¥5) = —pacpa; (the ps; being the components of 7%). 
For e” > 0, an elementary computation (the details being like those of [1], 
p. 661, except that now m > 1) then shows that, for a fixed arbitrary irrational 
z, the corresponding distribution of n\[F(z, x) — Ja,(z)], where Ja,x(z) is the 
obvious best linear estimator in D, , of F(z, ) for x in Fu (not in general S,(z), 
unless z ¢ Aj’), has, with probability >1 — ¢” under P? , an absolutely con- 
tinuous component the magnitude of whose Lebesgue density is at least 


(6.12) (1 — e”) q(y, o(z, k)) 


on the interval —1/e” < y < 1/e”, where o(z, k) is continuous in z and satisfies 
(6.10). Since ¢” is arbitrary, (6.9) follows easily from (6.12) and the trivial 
one-dimensional case of [8] (see [1] for details; the argument here is easier, since 
we have not yet included the additional dependence of W on other quantities 
as in [1] and Section 7 below). Thus, Theorem 2 is proved. 

It is clear that Theorem 2 remains valid if $ is replaced by a suitably large 
subset. Further generalizations will be discussed in the next section. 


7. Other loss functions. We list a few of the extensions of Theorems | and 2 
which may be proved by the same methods with only minor modifications and 
no essential new difficulties in the proof. In fact, our treatment of the case m > 1 
(compared with the argument of [1]) has been concentrated on the difficulty 
engendered by the nonconstancy of r,(F, ¢.), and that nonconstancy (in the 
counterpart of modification /, below) is the only real new difficulty in any of the 
corresponding generalizations of Section 6 of [1] (the difficulty is more trivial 
there, where m = 1 and the nonconstancy is easier to deal with than in Theorems 
1 and 2 above). 

A. In Theorem 2, the form of W may be extended. For m = 1, the more general 
form W(n""|F(z) — g(z)|, F(z)) was considered in Section 5 of [1]. The same 
form can be considered here, but perhaps the dependence on the second variable 
is no longer so natural; it may be replaced or supplemented, for example, by a 
dependence on the value of the marginal d.f.’s at the point z. The regularity 
condition which must be imposed on W in order for our method of proof to hold 
is, in any event, exactly the obvious analogue of that of Section 5 of [1]. For 
example, continuity and an appropriate integrability condition (the analogue of 
(5.5) of [1]) is more than enough. 

B. In Theorem 2, W can be replaced by a measure (rather than a density) 
in the second argument of the W of A above (or its replacements, just above). 





488 J. KIEFER AND J. WOLFOWITZ 


For example, when m = 2, one might be interested only in the estimation of the 
deciles of the marginal d.f.’s F,; and F, (say) and, at each decile r of F,, the 
deciles of the d.f. F(r, y) / F:(r) (and its counterpart with x and y interchanged ). 

C. An analogue of Theorem 2 (with any of the modifications noted above) for 
§ rather than $ is perhaps not too natural (see [1] for further comments), but 
can be given under suitable assumptions. An analogue of Theorem 1 or Theorem 
2 for the class of purely discrete d.f.’s (e.g., on R”, or on the integral lattice 
points of R”) can also be given; for example, the former essentially follows from 
the fact that there is a discrete d.f. at which ¢% has almost the same risk as at 
F; when n is large (see (1.4) through (1.8) ). 

D. In Theorem 1, one can replace D, by sup,{|g(z) — F(z)\|h(F(z))|, where 
h is a suitably regular nonnegative function whose dependence on F(z) may be 
replaced, e.g., by a dependence on the marginal d.f.’s, as in A above; a linear 
combination of such functions can also be employed. If A takes on only the 
values 0 and 1, this modification amounts to taking the supremum of the devia- 
tion over a suitable subset of R” whose description depends on F. 

E. In Theorem 1, one could consider the measures P, Q, , and g* corresponding 
to F, S,, and g, and could let W depend on supa|P(A) — g*(A)| where the 
supremum is taken over a suitable family of sets, e.g., rectangles with sides 
parallel to the coordinate axes. This presents no new difficulties. 

F. The function h of D above, the second argument of W in A above, and the 
integrating measure of (6.2), can all be changed so as to depend only on z and 
not on F(z) (or they can depend on both). This requires no new arguments, only 
obvious regularity conditions as on p. 664 of [1]. It is again the existence of an 
F; which is the crucial point. 

G. The remarks on the sequential asymptotic minimax character of ¢% for 
suitable weight functions, which are contained on pp. 664-665 of [1], hold here 
without change. 

H. Obvious combinations of the types of dependence of W, on F and z which 
occur in Theorems 1 and 2 and in the previous remarks can be considered with 
no essential new difficulty. In fact, the asymptotic minimax character of ¢% seems 
to hold for a very general class of weight functions. The discussion of p. 664 of 
[1] indicates the possible breadth of that class, but we are even further than we 
were in the case m = 1 of [1] from being able to give a single simple, unified proof. 


REFERENCES 


{1] A. Dvorerzxy, J. Krerer, ano J. Woirowi1tTz, “Asymptotic minimax character of the 
sample distribution function and of the classical multinomial estimator,’’ Ann. 
Math. Stat., Vol. 27 (1956), pp. 642-669. 

(2] J. Krerer, anp J. Wotrowitz, ‘‘On the deviations of the empiric distribution function 
of vector chance variables,’ Trans. Amer. Math. Soc., Vol. 87, Jan. 1958, pp. 
173-186. 

[3] M. D. Donsxer, ‘Justification and extension of Doob’s heuristic approach to the 
Kolomogorov-Smirnov theorems,’’ Ann. Math. Stat., Vol. 23 (1952), pp. 277-281. 

(4) J. L. Doon, ‘Heuristic approach to the Kolmogorov-Smirnov theorems,” Ann. Math. 
Stat., Vol. 20 (1949), pp. 393-402. 





SAMPLE DISTRIBUTION FUNCTION 489 


[5] C. G. Esspen, ‘Fourier analysis of distribution functions,’’ Acta. Math., Vol. 77 (1945), 
pp. 1-125. 

(6] H. Berestrom, “On the central limit theorem in the space R', k > 1,” Skand. 
Aktuarietids, (1949), pp. 106-127. 

[7] R. v. Mises, “‘Fundamentalsitze der Wahrscheinlichkeitsrechnung,’’ Math. Zeit. 
Vol. 4 (1919), pp. 1-97. 

[8] T. W. ANpeRson, “The integral of a symmetric unimodal function,’’ Proc. Amer. Math. 
Soc., Vol. 6 (1955), pp. 170-176. 





BAYES SOLUTIONS OF THE STATISTICAL INVENTORY PROBLEM! 


By Hersert ScarF 
Stanford University 


1. Introduction. The inventory problem as discussed in the paper of Arrow, 
Harris, and Marschak [1] is a sequential decision problem. At the beginning of 
each time period a decision is made to stock a quantity of a specific item in an- 
ticipation of demand during that period. If the demand exceeds the available 
supply, the next period is begun with an initial stock which is either zero or 
negative. If the demand, on the other hand, is less than the available supply, 
the subsequent period begins with a stock level which is equal to the excess of 
supply over demand. In both cases the stock level may be augmented by addi- 
tional purchasing. 

A number of costs are assumed to be operative in this situation, among them, 
a cost of purchasing stock, a cost of holding or maintaining the stock in inven- 
tory, and a cost which arises whenever current inventory is insufficient to meet 
demand. The main problem is the determination of a sequence of purchasing or 
ordering decisions which minimizes some criterion built up from these costs. 
The criterion adopted in this paper is to discount costs incurred n periods in the 
future by an amount a”, and to select that sequence of stockage decisions which 
minimizes the sum of all discounted costs. An elaborate discussion of the costs 
and the general structure of inventory models may be found in [2]. In the paper 
of Arrow, Harris, and Marschak and in a number of other papers in this area, 
the assumption has been made that the quantity demanded during any time 
period is a random variable whose distribution is known and unchanging from 
period to period. However, in the second of two papers by Dvoretsky, Kiefer 
and Wolfowitz [3] a more general situation in which the demand distribution is 
not known precisely is examined from the point of view of statistical decision 
theory. In the present paper our concern will be with this latter problem. The 
treatment, in distinction to that given in the paper by Kiefer, et al will be very 
specific, in the sense that we shall restrict our attention to very simple types of 
cost functions in order to obtain some detailed results about the optimal stockage 
policies. 

The costs will be as follows: 

(a) The ordering cost c(z), as a function of the amount ordered, will be as- 
sumed to be linear, i.e., c(z) = cz. 

(b) If the inventory at the close of the period is positive a holding cost h(x) 
will be incurred, which in this paper is assumed to be a linear function of the 
quantity of inventory on hand at the end of the period, i.e., h(x) = he. 

(c) If the quantity of stock at the end of any period is negative a linear pen- 





Received June 13, 1958. 
1 Research sponsored by the Office of Naval Research N6onr-25140 (NR-342-022). 


490 





STATISTICAL INVENTORY PROBLEM 491 


alty cost will be incurred. The marginal penalty cost is represented by the con- 
stant p. It will be assumed that p > c(1 — a). 

It should be noted that under these assumptions, if the demand distribution 
is known precisely the optimal stockage policy is exceedingly simple to deter- 
mine. In fact there exists a single critical stock level # so that if the stock on hand 
at the beginning of any period is greater than Z we do not order, and if the stock 
level z at the beginning of the period is less than Z, we order an amount # — z. 
The number Z is obtained as the solution of the equation 


(1) poe = I y(£) dé, 


where ¢(£) is the density of the distribution of demand. 

In this paper, rather than making the assumption that the density of demand 
is known precisely, we assume that it may be described by a density g(t, w) 
with an unknown statistical parameter w. It will be convenient to restrict our 
attention to densities ¢(£, w) which are members of the exponential class, i.e., 


(2) ¢(f, w) = B(w)e “*r(£), 


with r(é) = 0 for — < 0. The reason for this restriction is that at the nth stage 
all of the relevant information may be summarized in the single statistic s = 
(€; + --+ + &1)/(m — 1). We shall assume that r(£) is bounded, and that 
r(é) > 0 for — > 0. We shall also assume that an a priori Bayes distribution 
is chosen for the unknown parameter w. At the beginning of the nth period 


the available information will consist of two parts, one a knowledge of the cur- 
rent stock level x and the second part a knowledge of the statistic s. The optimal 
policy at the nth period will be that function of the two variables z and s which 
indicates the proper quantity of the items to be purchased. Our first result, 
which is crucially dependent upon the assumptions made about the cost func- 
tions, is that the optimal policies are again defined by single critical numbers. 
In fact there is a sequence of functions Z,(s) so that the quantity to order is 
equal to 


(3) Max{Z,(s) — x, 0} (see Theorem 1). 


The critical functions Z,(s) are themselves quite difficult to obtain analyti- 
cally. Some properties of the functions may be obtained, for example, it will be 
shown that Z,(s) is monotone increasing, a fact which is essentially a conse- 
quence of the monotone likelihood ratio property possessed by all members of 
the exponential class [7]. This result is demonstrated in Theorem 2. 

Theorem 3 presents an additional result. If at the beginning of the nth period, 
the stock level is less than Z,(s), we bring the stock level up to this number. 
If the demand during the nth period is &, the new stock level is 7,(s) — & and 
the new average demand is ((n — 1)s + &)/n. In Theorem 3 it is demonstrated 
that no ordering is done at the beginning of the (n + 1)st period if & is suffi- 
ciently small. Mathematically this is equivalent to 





492 HERBERT SCARF 


Ensi([(n — 1)8 + &]/n) 
for — sufficiently small. 

Our primary interest however will be in obtaining an asymptotic expansion 
for the critical functions Z,(s), for large n. It will be shown, under the assump- 
tion of certain regularity conditions, that 


a(s) 
n- 


(4) =,(s) ~ #(s) + 


The function #(s) is the critical level for the inventory problem in which the 

demand density is known precisely, in fact known to be ¢(£, w) where w is selected 

so that the mean demand is actually s. This is, of course, the value of w which 

satisfies the equation 

hun d | . ; 

(5) ag6 é which we shall write as w(s). 
dw 

Therefore, as (1) indicates, the function #(s) is given by the solution of the equa- 

tion 





(6) ph ima) 


Z(s) 
i> — [ ¢(£, w(s)) dé. 


The term a(s) may also be determined explicitly. Let @(2, w) be the cumula- 
tive demand distribution, and let o°(w) be the variance of demand if the true 


parameter value is w(o° = —(d° log B/dw’)). Then 
a (f98 
“ dw \o? Ow 
(7) a(s) = an 


with x replaced by Z(s) and w replaced by w(s) after the differentiation is car- 
ried out. The expansion is valid if f has two continuous derivatives near the point 
w(s), f(w(s)) > 0, and 0 < s < Max, (6’/8)(w). 

The sign of a(s) may be either positive or negative. Since w(s) is the maximum 
likelihood estimator of the true parameter value, 7(s) will be the maximum 
likelihood estimate of the critical stockage level, and equation (5) indicates 
that the optimal Bayes stockage policy approaches the maximum likelihood 
policy occasionally from above and occasionally from below, depending on the 
costs and the original Bayes estimate. 

It is important to realize that in this discussion neither s nor Z,(s) are ran- 
dom variables. ,(s) is for each n a specific function which minimizes total cost 
under the assumption that a specific Bayes distribution is operative. These 
functions are difficult to compute directly and our purpose in obtaining an 
asymptotic expansion is solely to approximate these functions in a simple way 
for large values of n. 

The crucial part of the proof of (5) is the determination of the first several 
terms in the asymptotic expansion of the a posteriori Bayes density. and there 





STATISTICAL INVENTORY PROBLEM 493 


are many points in common with scme recent work done by Johns and Guthrie 
[5], in a problem of acceptance sampling. 


2. A review of the problem when the demand distribution is known. In this 
section we shall assemble those facts about the non-statistical version of our 
problem, which will be needed in the subsequent sections of this paper. We 
begin by defining C(2z) to be the expected cost incurred by the use of an optimal 
policy if the initial stock level is x. Costs which occur n periods in the future are 
discounted by a factor a” and incorporated in the function C(z). In the model 
that we are considering « may take on negative values, representing unfilled 
orders. 

If the policy used is to order an amount y — «x when the initial stock is z, 
then the cost incurred during the first period is e(y — «) + L(y), where 


(a | (y — E)pe(&) dé + r| (& — y)e(€) dé; y>o 
(8) L(y) = : ’ 


o 


ip [ (— — y)o(€) dé; y <0. 


At the end of the first period the stock on hand is y — &. Since the situation faced 
at the beginning of the second period is precisely that faced at the beginning of 
the first. period with the exception that the stock level is now y — £, we see that 
the expectation of all future costs may be summarized by a ff C(y — &)¢(€) dé, 
and therefore C(x) S {e(y — x) + L(y) + aft Cly — &)e(£) dé}. The choice 
of y which minimizes the right hand side of this equation is clearly optimal and 
we obtain 


(9) C(x) = inf {ey — x) + L(y) + a | Cy — &)e(é) ae}. 
vez \ 

It is sometimes convenient to consider an inventory problem with precisely 
the same costs as the above problem, but which is engaged in for a total of N 
periods rather than continuing indefinitely in the future. No costs, either holding 
or penalty costs, are incurred after N periods have elapsed and no salvage value 
is to be attributed to any excess stock at the end of N periods. If the minimum 
expected costs for such a problem are denoted by C*(z), then C*(x) tends to 
C(x) at every point [6] and we have the following equation: 


(10) C’(x) = inf {ey — cx + L(y) + a | CX—"(y — £)e(£) ae! , 
0 } 


vez \ 


where C™'(x) = 0. 

Equation (10) may be used to show that C’ (zx) is convex, and therefore C(z) 
is convex. A simple argument, based on (9) and the convexity of C(x) shows 
that the optimal stockage policy is indeed defined by the single critical number 
which minimizes 


(11) a ¥ te € a | Cly — ee) dé. 
0 





494 HERBERT SCARF 


(This argument is given in detail in the next section for the statistical problem.) 
Equation (9) shows us that for « < #, C’(x) = —c, and it therefore follows 
from (11) that # satisfies the equation 


0=c(l—a)+L'(y). 


This is merely a restatement of equation (1). 

This method permits us to make a direct computation of the optimal stockage 
policy, without making an explicit determination of the cost C(z). We shall 
find it necessary in the treatment of the statistical problem, to utilize the fact 
that in some sense the minimum cost function depends continuously upon the 
demand distribution. In order to obtain this result we shall derive an expression 
for C(z). 

If x > &, equation (9) yields 


C(x) 


ll 


L(t) + a | C(x — &)e(€) dé 


ll 


L(x) + a | ; C(x — &)e(&) dé + af Cle — &)p(€) dé. 


In the latter integral x — § < Z so that C(x — &) = c(# — x + £) + C(#) 
and we obtain for x > Z 


C(x) = L(x) + alee — ex + C(2)) | o(8) dt 


+acf gledgtaf Cle — sels) de 
z—-z 0 
If we put x = Z, we obtain 


L() + ac [ te(t) dé 


l—a 





(13) C(z) = 


and if we define U(x) = C(a + £), we obtain 
(14) U(2) = A(z) taf Ue — Bol8) dé. 
where 


A(x) = L(x + £) + a(—er + c(a)) | o(€) dé + ae &(&) dé. 


Equation (14) is a Volterra equation which may be solved for U(z) as follows: 
We define M(x, a) = >\?a"@ (x), where &” is the n-fold convolution of 
the cumulative demand distribution. Then 





STATISTICAL INVENTORY PROBLEM 


(15) U(x) = C(x + 2) = A(z) + [ A(x — £) dMi(é, a). 


This discussion may be used to prove the following lemma. 

Lemma 1. Let ¢,(x) be a sequence of positive density functions which converges 
uniformly in every finite interval to the positive density function g(x). Furthermore, 
assume that the functions have finite means which converge to the finite mean of ¢. 
If we define C*(x) to be the cost function for the inventory problem with demand 
density gx(x), then C*(x) converges to C(x) at every point. 

If we define Z* to be the critical stockage level for the problem with demand 
distribution g(a), then it is clear from equation (1), that z* — @. By uniform 
convergence and (8) we see that L‘(z*) — L(#) and therefore by (13) C*(#") 
C(#). If « < @ then for large k, « < @ and C*(x) = c(# — x) + C*(#) 
which tends to C(x). On the other hand if z > # and consequently >#* for 
large k, then we may use (15) to show that C*(z) tends to C(x). The only point 
that needs mention is that if we define M*(z,a) = yi a"{” (x), then M*(x, a) 
converges uniformly in any finite z-interval to M(2, a). 


3. Properties of Bayes solutions to the statistical problem. In this section 
we shall set up a functional equation analogous to (9) and discuss some simple 
properties of Bayes solutions to the problem. The asymptotic expansion of the 
Bayes solutions will be discussed in the subsequent section. 

We assume that the demand density is of the form 


(16) o(t, w) = B(w)e **r(é). 


We shall also assume that an a priori distribution has been selected for the 
unknown parameter w. We denote the density of this distribution by f(w). 

At the beginning of the nth period, the information available to the decision 
maker is a knowledge of the present stock level xz, and a record of all the pre- 
vious demands £ , f&,--- , &n-,. All of the demand information may be sum- 
marized in the sufficient statistic 


n=l 


Se 

(17) &§ = - = i ° 

We define C,,(z, s) to be the discounted sum of costs incurred if an optimal policy 
is followed. We first of all remark that the demand density with which the de- 
cision maker is faced during the nth period is prob (demand = £ | average de- 
mand in last n — 1 periods = s), and this is 


[ B"(w)e Fe fw) dw 
(18) e.(€|8) = r(€) ——_———_ i. 


8" (we f(w) dw 





This distribution summarizes the decision makers expectations as far as demand 
during the nth period is concerned. 





496 HERBERT SCARF 


If the policy used is to order an amount y — z, then the expected cost during 
the nth period is c(y — x) + L,(y| 8), where 


[a fw — dentele)de + pf Ce — wealele de; y >0 
(19) Ln(y| 8) = | ‘ 


\? , (E — y)on(E| 8) dé; y <0. 


If demand in the nth period is &, then the stock level at the beginning of the 
(n + 1)st period is y — £, and the statistic s which represents average demand 
in the preceding periods becomes ((n — 1)s + &)/n = s + ((& — 8)/n). 
Therefore the discounted expected cost from the (n + 1)st period onward, if 
an optimal policy is followed in those periods is C,ys:(y — & 8 + ((& — 8)/n)) 
We therefore obtain the following functional equation 


r 
C.(z, 8) = inf 4 e(y — xz) +L,(y|s) 
(20) ver \ wo \ 
— s) 
+ a | Cua(y - £3 + = 9) oCEl0) dg>. 
0 n } 


It is worth remarking that this heuristic derivation may be made quite rigorous 
by considering the abstract decision spaces {6}. 

In order to analyze equation (20) we first define C2(x, s) to be optimal dis- 
counted expected costs from time period n onwards if the inventory problem is 
engaged in for a total of N periods. It may be shown that 


Cr(z, 8) = nt {ely — t) + La(y|s) 


(21) Tr te = 
taf Ota(y- 50+ & sep 


and that limy.. C*(z, s) = C,(z, s). If n > N, then Ci(z, s) = 

Lemma 2. CX(z, s) has a continuous derivative with respect to x, La is convex 
with respect to x. The optimal policies are defined by single critical numbers #x(s) = 
0. CX (a, 8) has a continuous second derivative with respect to x at all points except 
perhaps x = £,(s) at which point both the right and left hand second derivatives 
extst. 

The lemma is clearly true when n = N + 1. Let us assume that it is true for 
n + 1 and show that it remains true for n. The function 


(22) cy+ Lily|s) +a [ cta(y-e0+ = 2) gale |) 


is a differentiable convex function. For y ~ + this function becomes posi- 
tively infinite. To see this we notice that this function is larger than 


cy + Lily | 8) 





STATISTICAL INVENTORY PROBLEM 497 


and for y > 0 this latter function has a derivative equal to c — p + (h + p) 
So en(€ | 8) dé, which is certainly positive for y—> +. For y < 0, the stock 
level y — & < #n4:(8 + ((E — 8)/n)) (by the assumption that all of the criti- 
cal numbers are positive), and therefore 


Cra (y — e+ (§ = —) - o| af. (: + (§ = —)) y+ ‘| 


— 


+ Cin (2, (. +} (E— °) a (§ — ). 
" n 


It follows that (0/dy)Cr.i(y — &, 8s + ((— — 8)/n)) = —c for y <0, and 
therefore the derivative of (22) with respect to y is c(l — a) — p < 0. It 
follows that the minimum of (22) occurs at a point #%(s) > 0. From the con- 
vexity of (22) and equation (21) it follows that Z%(s) is indeed the critical 
number. In order to show the convexity of C%(x, s) we notice first of all that for 
x < #%(s), the function C%(z, s) is linear in x. For y > #:(8), Cy(y, 8) = (22) 
with the term cy replaced by zero, and is therefore convex. 

The only problem in showing the differentiability of CX (z, s) with respect to 
x, is in verifying that the right and left hand derivatives are the same at the 
point #:(s), and this follows from the fact that the derivative of (22) equals 
zero when y = £n(8). 

It follows from this lemma that C,(z, s) is a convex function with respect to 
x, and also that its derivative with respect to x is equal to —c, when x < 0. A 
repetition of the above arguments yields the following result. 

THEOREM 1. There is a sequence of non-negative functions Z,(s8) so that the op- 
timal ordering rule is to purchase Min (#,(s) — 2x, 0). In fact, Z,(s) = limy.. 
#x(8). 

In the course of proving this theorem, the critical number Z,(s) will be found 
to be the unique minimum of 


(23) cy+Lly!s) + af Cas (y a +! Ex #)) onl | 8) dé. 


The uniqueness of the minimum is a consequence of the fact that (23) has 
strictly increasing first differences, which is itself implied by L%(y!s) = 
(h + p) only|s) > 0. 

Unlike the case in which the demand distribution is known precisely there is 
no simple equation along the lines of (1) which determines the critical numbers 
£,(s). Certain properties of the critical functions may, however, be derived 
from a direct examination of equation (20). In the following argument we shall 
show that Z,(s) is increasing in s, a fact which is interesting in its own right and 
necessary for the considerations of Section 4. 

Lemma 3. 8C%(x, s)/dx is decreasing in s. 

The proof of this lemma is by backwards induction on the subscript n. The 
proof uses, in a crucial way, a property of densities with a monotone likelihood 








498 HERBERT SCARF 


ratio [7]. A probability density p(x | w) is said to have a monotone likelihood 
ratio if 


(24) det | ?(% | 1) p(x, | we) 2 


for %; > x. and aw, > we. 
p( x2 | wn) p( 2X2 | we ) : ; 


We shall use the fact, demonstrated in [7], that if p(z|w) has a monotone 
likelihood ratio, then h(w) = f p(x | w)g(x) dx is a monotone decreasing func- 
tion of w if g(x) is a monotone decreasing function of z. 
The particular densities that we shall be interested in are the densities 
B"(w) ee fw) da 
en(é | 8) = r() 2=——— 
| 8" "(w) era dw 
0 





defined in equation (18). 
If the determinant (24) is formed, using this density, it will be seen to have 
the same sign as the determinant 


[ pre erg tort ) dw [ pre “tte “DeraF( oy) dw 
0 0 


} I B"e “te (n “Petts wy) dw [ B"e —wts,—(n Deret( yy) dw 
0 
and this may be seen to be positive, by virtue of the fact that 


a 
log f e ““g(w) dw 
0 
is a convex function whenever g is positive (in this case we take 
g(w) = B"(w \f(w)e ae (n—1) weg ). 


We now proceed with the induction proof of Lemma 3. Let us assume that 
ag : . : . ‘ 
0C,41(2, 8)/ds is decreasing in s. From Lemma 2, we see that 


aC* (x, 8) 
mee 
(25) [ thnhe le) 4. 4 [ o0sst( » — 504 = *Voalt | 8) dé; 
és Ox 0 Ox n 
2 & 2,(s), 
| —¢; z S ,(s) 


Let us first of all show that 


(|e od WV —_ 
(26) G(z,s) = Waele) 4 a [ oot (« nig 0 08 )) on(s | s) dé 
0 € 


Ox n 


is decreasing in s. Since dL,(2z, s)/dx is equal to —p + (h + p) Joen(E| 8) dé, 
we may rewrite the expression for G(x, s) as 





STATISTICAL INVENTORY PROBLEM 


G(2,8s) = 


(27 ) oo ( ac’ (« (é 8) 
) n+l g, 8 + - 
P l ‘ 


‘Ox 





| 
) + (n+ pase) (etl 9) 


where x2(£) is the characteristic function of (0, x). If s,; < s,, then by the in- 
duction assumption, we have 


G(a, 81) = 
(28) ( ow (€ — &) | 
r+ (ht p)xate) [Orel 9) a 





The integrand in this expression, is a monotone decreasing function of £, by vir- 
tue of the convexity of C4, with respect to its first argument, and because of 
the induction assumption. Using the property of distributions with a monotone 
likelihood ratio we conclude that the integral is larger than the corresponding 
integral with s, replaced by s.. This demonstrates that G(x, s,) 2 G(x, 8%). 

The critical number Z.(s) is the solution of the equation —c = G(z, s). This 
permits us to conclude that Ex(s,) S Ex(s.). To see this we write 


Gi (En (8), 82) = —Cie- Gh (#n (81), 8) < Gi (#n(81), 82), 


and recal] that G is an increasing function of its first argument (see the proof of 
Lemma 2). 

In order to complete the proof of this step of the induction, we consider three 
cases. . 

(a) « S #.(s,). In this case 

ACu(z, 81) _ 9Cn(x, 82) _ 
Ox Ox 
(b) #2(s,) S x S €2(s8). In this case 


ACx(z, 8:1) _ 9CR(x, &) 
Ox Ox 


0. 


= G(x, 8) +c = G(#(s:), 1) +e = 0. 


(c) #x(s.) < zx. In this case 


N agyN : 
ei 81) “ os 82) al G(x, 81) . G(x, 8) >0. 


This completes the proof of Lemma 3. 

As a consequence of Lemma 3, we see that for b > a, Cx(b, 8) — Cx(a, 8) 
is decreasing in s, and passing to the limit we obtain C,(b, s) — C,(a, 8) is de- 
creasing in s. By a slight modification of the arguments of Lemma 3, we obtain 

THEOREM 2. The critical stockage numbers £,(s8) are increasing functions of the 
statistic s. The critical numbers #,(s) are also increasing functions of s. 

The next question that we shall examine is the relationship between the critical 
numbers in successive time periods. Suppose that at the beginning of the nth 





500 HERBERT SCARF 


period the stock level z is less than the critical level Z,(s), where s represents the 
average demand in the preceding n — 1 periods. The optimal rule is to bring 
the stock level up to Z,(s). If demand during the nth period is &, then the stock 
level at the beginning of the (n + 1)st period is Z,(s) — &, and the average de- 
mand is s + ((£ — s)/n). We shall address ourselves to the question of whether 
the optimal rule calls for positive ordering in the (n + 1)st period if the demand 
£ is small. To say that there will be no positive ordering is equivalent to saying 
that Z,.(s) — — 2 Znui(s + ((€ — 8)/n)) for all small ¢, and in view of the 
conclusion of Theorem 2, this is equivalent to saying that 


En(8) > Engi(s(m — 1)/n). 


This result is actually true. 

THEOREM 3. The critical levels have the property that %,(s) > En4i:(s(n — 1)/ 
n), or in other words, we do not order in any period if the demand has been suffi- 
ciently small during the previous period. 

Let us assume to the contrary that there exist values of n and s, so that 


(29) z,(s) < #4 (2) s). 
nm 


We know that (see (23)) Z,(s) is the unique minimum point of 


A 


, = (g— 8) 
(30) cy + In(y|s) +a [ Cun (y - 55+ c= )) ealE| 8) dé. 


For y < Z,(s) we will have, by assumption, y — £ < Zn4:(8 + ((£ — 8)/n)) 
and therefore Cnii(y — &, 8 + ((& — s)/n)) will be a linear function in y whose 
derivative is equal to —c. Therefore the derivative of (30) for any value of 
y < &,(8), is equal to c(l — a) + OL,(y| s)/dy. It may also be shown that 
both the right and left hand derivatives of (30) exist and are equal at the point 
Z,(s), and we conclude that 


aLn(y | 8) 


(j — a) 
31) e(1 a) + ay 


= 0, when y = #,(s). 


Now let us examine the equation which gives Z,4:(s(m — 1)/n). Similarly as 
with (30) we know that Z,4:(s(m — 1)/n) minimizes 


ai iil ” hot : 
ey + Inss(y|*=*) +a f Con (y- 4 St +E) 


Ont (« | “— : s) dé. 





(32) 


If we evaluate this function at y = Zn4;(8s(n — 1)/n) subtract the value for 
Y = Enai(s(nm — 1)/n) — 54, divide by 6, and use the following consequence of 
the convexity of Cryo: Caso(y — §,.) — Casoly — 6 — &,.) 2 —cé, then we 
obtain the conclusion 





STATISTICAL INVENTORY PROBLEM 


—] n-—-l 
OLintt (4. ( s) — ) 
(33) 02c(l—a) + - ee 


Since the right hand side of this equation is an increasing function of its first 


argument, it follows that Z,4:(s(m — 1)/n) is less than the solution of the equa- 
tion 


n—1 
OL, y 8 
(34) ' a( et ) 
c a) + ay 


and therefore by (29) it follows that Z,(s) is less than the solution of (34). 

We shall show that this last statement is false, thereby proving the theorem. 
In other words, we shall show that the solution of (31) is always strictly greater 
than the corresponding solution of (34). In view of the specific nature of the 
functions involved this will be correct if we can show that 


in—1 
Oy oy 


(35) [enu(e to 4e)ae> [entitle ay, for y>0. 


The proof of (35) is quite simple and again makes use of the monotonicity pre- 
serving property of densities with a monotone likelihood ratio. Since the charac- 
teristic function of the interval (0, y) is a monotone decreasing function of &, 
it follows that 


y ee [ prttg tates | e **r(£) ae | (0) dw 
Cenlileste ot temo 


or, see (19) , 


[ Bre Fw) dw 


is a monotone decreasing function of s, and therefore 


, n—1 “ " In —1 T 
[enn (el: =) az I oun (€ gr 8%, ar ;) ae 


[ peter f e “r(€) ae | sw) dw 
= > ° - 


[ fee, “"f(w) dw 
0 





I ee (« | a—t s) ae: | Bree" fw) dw 


| grtt.—ta—Dee re | f ‘ *“r(£) ag | f(w) dw , 
0 0 


(36) 





502 HERBERT SCARF 


with strict inequality holding for some values of r. If we multiply both sides of 
(36) by r(r), integrate with respect to 7, from zero to infinity, and remember 
that ff e“r(r) dr = 1/8(w), we obtain 


[mss (« 


n—1 


if n—1l_ —(n=l) ws 
mat) ae: [ prt H() de 


> 1 se eve If e**r(£) ae f(w) dw ; 


which is the same as (35). This completes the proof of Theorem 3. 





(37) 


4. Asymptotic properties of the critical numbers. The sequence of past de- 
mands may be used to estimate the true demand distribution with increasing 
accuracy as the number of time periods becomes large. It seems reasonable to 
expect, therefore, that the critical number Z,(s) will, for large n, be quite close 
to the critical number £(s) obtained by assuming that the mean demand is s. 
We shall, in this section, demonstrate that Z,(s) does indeed tend to #(s), and 
then establish an asymptotic expansion of the form Z,(s) ~ Z(s) + (a(s)/n), 
with a(s) given by (7). The proof of this fact will depend on obtaining certain 
asymptotic expansions for the a posteriori density of w, given that the demand in 
the first n — 1 periods has a mean of s. 

We shall summarize the necessary asymptotic results in the following two 
lemmas, without proof. In each case, the proof is a relatively straight forward 
application of Laplace’s method for the evaluation of exponential integrals [5], 
[8]. The a posteriori density of w, given that & + --- + 1 = (n — 1)s, is 


pr! (w) e7(n—1 wef (uy) 


« ~ falw | 8) , 
I Bw) fw) dw 


(38) 


and the maximum likelihood estimate of w is given by the solution of the equa- 
tion 
(39) axe he) = 8, 
which we shall also write as w(s). We shall only consider those values of s for 
which 0 < s < lim,.o (8’(w)/B(w)). 

LemMa 4. ¢n(& | s) — o(€ | w(s)) uniformly in every finite = interval, if 


f(w(s)) > 0. 


The proof of this fact may be obtained by minor modifications of the argu- 
ment presented on p. 283 of [8]. 

Lema 5. Let f(w) and h(w) have continuous 2nd derivatives in some neigh- 
borhood of the point w(s), and let f(w) > 0. Then 


“ (cy) 
[ h(w)fa(w | s) dw ~ h(w) + ee ’ 





STATISTICAL INVENTORY PROBLEM 503 


where o = —d° log8/du", and the coefficient of 1/n is evaluated at the point w(s). 
If only the first term is required, then continuity of h and f are sufficient. 

This lemma may be demonstrated by an application of Laplace’s method 
[5]. 
Let us apply these results to verify that Z,(s) — #(s). The proof depends 
on the following simple observations. Let us consider an inventory problem in 
which the true value of w is revealed after n — 1 demands are experienced. In 


this situation the expectation of the discounted costs from the nth period on- 
wards is 


(40) [ C(x, w)fa(w | 8) de, 
0 


where C(x, w) is the cost for the sequential inventory problem, when the true 
value of the parameter is known to be w. It is possible, however, to use in this 
situation the strategy defined by the critical numbers Z,(s) which are optimal 
in the statistical problem. If this strategy is used it disregards the information 
which is made available as to the true value of w, and consequently the expected 
cost incurred by the use of such a strategy will be greater than (40). This cost 
is, of course, C,(2, 8), and therefore [¢ C(x, w)fa(w|s) dw S$ C,(z, 8). 

On the other hand the following situation may be considered. Let us assume 
that after (n — 1) demands have occurred, no additional demand information 
will be made available to the decision maker. In this case the minimum ex- 
pected cost from the nth period onward will be the same as the non-statistical 
problem with a demand distribution given by ¢,(&|s). If we denote this cost 
by C"(2 | s), then clearly 


(41) I C(2, w)fa(w | 8) dw SC, (2,8) S C”(2z,8). 
0 


By Lemma 5, we see that {7 C(x, w)f,(w|s) dw tends to C(z, w(s)). In 
order to appy Lemma 1, we must verify that ¢,(|s) — ¢(€|(s)) uniformly 
in every interval, and that the means converge. The first part of this statement 
is implied by Lemma 4, and the second part by an application of Lemma 5 to 
the function h(w) = B(w) fo ge “*r(€) dé. We have therefore demonstrated 

Lemma 6. If f(w(s)) > 0, then C,(2, 8s) — C(2, w(s)). 

It is a simple consequence of this lemma that Z,(s) — #(s). 

We have therefore established 

THroreM 4. Under the assumption that f(w(s)) > 0, we have Z,(s) — Z(s), 
%(8) being given by (6). 

We now address ourselves to the problem of showing that the asymptotic 
expansion for %,(s) given by (7) is indeed valid. Our first result is embodied 
in the following lemma. 

Lemma 7. For any fixed value of x, 


0 = Li(é,(s) ls) +ce(l—a) = 0 (z0(s) ~ fan (® = :)) ,asn— 2. 


n 





504 HERBERT SCARF 
In order to demonstrate this lemma, we consider the cost function C%(z, s). 
According to Lemma 2, and its proof, we have 


—c = Li (#%(s) |s) —ac+ 
(42) ” 
a I [ o1.(z x (8) — 8+ aa i *) + | onl | 8) dé. 


The integrand on the right hand side vanishes for all with 
#n(8) — & < Enua(s + ((E — 8)/n)), 


and is positive for other values of ¢. For N — «, we know that #:(s) — Z,(s), 
and if & > %,(8) — En4:(s(m — 1)/n), then %,(s) — — < Enai(s + ((E — 8)/n)). 
(This follows from Lemma 3.) Therefore for N large, the integrand vanishes 
for > #(s) — #%4,(s(n — 1)/n) + dy, where 5y — 0 as N tends to infinity. 
Therefore, since the integrand is a decreasing function of § (Lemmas 2 and 3), 
we may write 


os [ |2 ta (z s(o) — 8 + £=*) 4 |entels) a 

(43) <9, (2 n(8) — En41 C- 8) + ins) 

: | 2 ct. (2 x(s), n—1e) +e]. 
Ox n 


. 


If ¢ is any positive number then 


fA ota (2 *(s), <3 s) +c 
Ox n 


< ‘Yo ta (22 *(s) +e— ae s) — Chu (2t),2= : sD} +c. 
€ nr nr 


If we substitute this in (42) and let N > «, we obtain 0 = Li(#,(s)| s) + 
c(l1 — a) = —,(%.(8) — Enis(s(m — 1)/n)| s) multiplied by 


{Cus (2.¢6) + gta? s) — Cay (z.(s), c= .)} +. 
nr n 


For n large ®,(u| 8) S Ky for s away from 0 and Max (8’/8)(w). Also (1/e) 
{ .-- } + ¢ approaches (1/¢){C(Z(s) + «, w(s)) — C(#(s), w(s))} + ¢, which 
may be made arbitrarily small. This proves the lemma. 

Lemma 7 is not sufficient to demonstrate the main result of this section, 
except under certain cases. Let us suppose that there exists a subsequence of 
{Zn4,(8(n — 1)/n)} which is greater than or equal to #(s). For this sequence 
En(8) — Enyi(s(n — 1)/n) S €n(8) — Z(8), and Lemma 7 may be rephrased 
as Li. (€n(8)| 8) + c(l — a) = of%,(s) — &(s)}. Then 


(44) 





ol 


Li(#(s)| 8) + c(1 — a) + [@,(s) — #(s)|Ln(#(s) + 0, | 8) 


= of{Z,(s) — Z(s)}, 





STATISTICAL INVENTORY PROBLEM 


where 6, — 0. But 


#(#) 


Li, (a(s)|s) = —p + (h+ ») [ en(| 8) dé 


a (aas) 
h 2 
~ <p t+ (h+ pyb(a(e) | o(e)) + +P? de we 
by Lemma 5, and L5,(2(s) + 0, |s) > (h + p)¢(Z(s)| ea Since 

e(l — a) — p+ (h+ p)®(%(s)| w(s)) = 
(equation 6) this demonstrates the asymptotic expansion for Z,(s), given in 
(7). 

In order to demonstrate the asymptotic expansion for Z,(s), in general, we 
shall find it necessary to show that these functions satisfy a Lipschitz condition 
of order 1. This is dependent, upon the following lemma. 

Lemma 8. Let s be any point between zero and Max, (8'/8)(w). Then for n 
sufficiently large 3°Ch(zx, s)/dxds exists except perhaps at the point x = #%(s) 
and is bounded from below, independently of N. 

The existence of aC" (x, s)/dxds follows directly from its definition. From 
equation (25) we see that 


0; x 


< 
-, OCr _ jafaL, ° Cra &— 
(48) Soe aL oe Te “Get (= — 86+ ES *)ewtelo a 
| z>éz 


| 


hs +a aCe (a — Petey —#) aC |) a 


is decreasing in s, so that its derivative with respect to s is negative, for all 
values of x. We may therefore write 


ace éL,, Pass So 4 
(46) ick = 2 2 (e+ af % (« — s+ —*) 8) ae], 


for all z. Now let us assume that we can find a sequence of functions F,,(z | s) 
which satisfy the equations 


a4 OF, OL, 
(47) _—"~ - 32 (+. [Pan (2 - gs +5 - -#) on(| 3) a, 


and are increasing in z and decreasing in s. Then it is a simple matter to show 
that 3°C/daxds = AF,,/ds, and this will furnish us with the required bounds. 
It will be sufficient to obtain a solution to the equation 


€n (s) 


The function 


P,(z,8) = (h + p)®a(2|s) + af Fara(z ao. ‘) y.(&| 8) dé, 
(48) 0 n 


for x > 0. 





506 HERBERT SCARF 


This equation may be shown to have the following solution. Define 6” (x | w) 
to be the the k-fold convolution of the demand distribution, and define 
(49) M(x, w) = (h + p) >, a'e**” (x | w). 
k=0 


Then 


I (Be™*)""M (x | w)f(w) dw 
(50) Fst | 0): mw Doar , for 2z 


[ (Be~"*)""f(w) dw 


IV 


0, 


and equal to zero for x negative. This function may be substituted directly in 
(48) to demonstrate that it is indeed a solution. It may also be verified directly 
that F,,(z | 8) is increasing in x and decreasing in s. 

In order to demonstrate Lemma 8 it is sufficient to show that dF,,(x | s)/ds 
is bounded from below as n becomes infinite. Now 


[ Gey 4(w) de 


(n — of (Be*)""M (2 | w) f(w) aw [ (Be-**) “af (ca) dea| 








[ Ge*)y"4H(w) de [ GeyV(o) de | 
0 0 
In this form we may apply Lemma 5 directly, and conclude that 


OF, (n — 1) [ (9) atx |w(s)) + o(2)] 
Os n 


= (n — 1) | arcz | ws) + o(4)| | w(e) + o(2)| 
n n 


The application of Lemma 5 merely requires that M(x | w) have a continuous 
second partial derivative with respect to w. But by direct calculations 





~ O(1). 


2 ) ier. . 2 —w 
ele) a atu) {—c" | e“*r(£) dé + | (§ — m)"e“'r(€) ax, 


| a2 
ae | < 20°(w), and similarly 


2ek+l ) | 
ae — L)} < 2(k + 1)0°(w). 








STATISTICAL INVENTORY PROBLEM 507 


This proves Lemma 8. 


Lemma 9. For any s and 6 sufficiently small there exists a constant k such that 
E,(8 + 5) = £,(8) s ké. 
Again we consider the cost functions CX . We have 
a 
€n(8), 8) 


o's. /- 
Pr ~ Cry’, 8), 
where y’ is some intermediary point. Again this is equal to 


N 
aCe (2% a 


2 
ee oes 


In other words 


tel(s + 8) — 25(e)) 2S (y, 0) = 32S (aX(e + 8), 0+). 


But d°/az°C*(y’, s) is positive, and is in fact larger than L2(y’ | s) (see (22)). 
Also —d°C%/dxds is positive, and by Lemma 7 is bounded, first as N — ~ 
and then as n — «©. This proves the lemma. 

Now let us give a proof of the general asymptotic expansion of the critical 
numbers. The method of proof is quite similar to that given above when 


Ensi(s(n — 1)/n) 2 Xs), 


and depends on showing, by means of Lemma 9, that Z,(s) — Z(s) = O(1/n). 
Consider the sequence Z,(s) and assume first of all that there exists an in- 
finite sequence of subscripts n, so that 


(51) En,(8) = Zn, 41(8), 


with the reverse inequality holding for the other terms in the sequence. Then 


0 < %,(8) — E41 (* = s) 
Nr 
= En, (8) a En,41(8) + {ara _ £u41 (== s)} 


We use the techniques given above, based on Lemma 7, to show that #(s) — 
z,,(8s) = O(1/n,). Now let us consider any subsequence Z,(s) all of whose ele- 
ments are <Z(s). Then if n, S n’ S n,4: we have Z,’ > &,,,, and #(s) — 
&,'(8) < K/n-4,; < K/n’. Therefore for any such subsequence #(s) — Z,(8) = 
O(1/n). For any subsequence all of whose terms are 2 Z(s), it is a trivial matter 
to show, using Lemma 7, that %,(s) — %(s) = O(1/n). 


ks 
Ny 





508 HERBERT SCARF 


On the other hand if we cannot find an infinite subsequence of the type de- 
scribed in (51), then we must have Z,(s) monotone decreasing, after awhile. 
Then Z,(s) > Z(s) for n sufficiently large. This is enough to show directly from 
Lemma 7, that %,(s) — #(s) = O(1/n). 

We have therefore shown that Z,(s) — Z(s) = O(1/n), regardless of any 
assumptions as to the form of the sequence Z,(s). Then 


€n(8) — Ents (" — :) = [z.(s) — 2(s)] 
- [Z(s) ~~. €n41(8)] + Exo —_ Fn+1 (— s) | 


-0(°) 


If we apply Lemma 7, the general asymptotic expansion is obtained. 


REFERENCES 


{1} Kennern J. Akrow, Tueopore E. Harris, anp Jacop Marscuak, ‘Optimal inven- 
tory policy,’’ Econometrica, Vol. 19 (July, 1951), pp. 250-272. 

|2} Kennetu J. ARrow, SAMUEL KARLIN, AND HERBERT Scarr, Studies in the Mathemati- 
cal Theory of Inventory and Production, Stanford University Press, Stanford, 
1958. 


|3] Arven Dvoretsky, J. Kizrer, anv Jacop Wo.rowitTz, ‘‘The inventory problem,”’ 

Econometrica, Vol. 20 (April, July, 1952), pp. 187-222, 450-466. 

|4] A. Erpe.y1, Asymptotic Expansions, Dover, 1956. 

[5] DonaLp GuTurie, JR. anD M. V. Jouns, Jr., ‘‘Bayes acceptance sampling procedures 

for large lots,’’ to be published. 

(6) Samugeyt Karun, “The structure of dynamic programming problems,’’ Naval Research 
Logistics Quarterly, Vol. 2 (December, 1955), pp. 285-294. 

SAMUEL KARLIN, AND HerMAN Rusrn, “The theory of decision procedures for dis- 
tributions with monotone likelihood ratio,’”? Ann. Math. Stat., Vol. 27 (1956), 
pp. 272-299. 

|8] Davin Vernon Wipper, The Laplace Transform, Princeton University Press, Prince- 

ton, 1941. 





7 





A GENERALIZATION OF THE BETA-DISTRIBUTION 


By J. G. MauLpon 
Corpus Christi College, Oxford 


0. Summary. A class of distributions is defined and studied which includes 
as particular cases (cf. Section 13) the ordinary §-distribution, the (univariate ) 
triangular distribution, the uniform distribution over any nondegenerate simplex, 
and a continuous range of other distributions over such a simplex, called basic 
8-distributions (Section 6) and immediately analogous to the ordinary 6-distribu- 
tion. Our class also includes (Section 13 (vi)) various (univariate and other) 
distributions which arise in connection with the random division of an interval. 

The main results are given in Section 2 and further results for the univariate 
case are given in Section 8. 

This paper is exclusively concerned with the mathematical theory. One ap- 
plication may, however, be mentioned, which will be considered in more detail 
elsewhere. Suppose we wish to test the hypothesis Hy) that n — 1 numbers 
Yi, °** » Yn (all lying between 0 and 1) were drawn independently from a rec- 
tangular distribution over (0, 1). Let m,---, usa be the lengths of the n in- 
tervals into which the y; divide the interval (0, 1). Then Hp is equivalent to the 
hypothesis that the point with vector-coordinate u is distributed uniformly over 
a certain non-degenerate simplex S, and a useful set of alternative hypotheses 
is the set of basic n-dimensional 8-distributions. Hence (using Section 4) this 


theory can be used to find the power-functions of certain tests of the hypothe- 
sis Hy . 


1. Definitions. Let x', --- , x" be n random variables with the joint distribu- 
tion function F = F(z’, ---, 2”). We shall be particularly concerned with a 
certain integral transform ¢, of this distribution, defined by the equation 


n \-? 
(1) dy(ts a4, +) = A(t Bax) } (p> 0) 


j=1 
or, more formally, by the equation 


2 ~ 2 n ed 
(2) p(t; a1, +++, Gn) =| | vf ( - X a2’) dF (x', +++, 2") 
~ 0 J 0 oo j=l 
(p > 0, im t # 0) 


where E} -} denotes, as usual, an expectation. Here p is the exponent of the trans- 
formation and ¢ and a; (j = 1, --- ,n) are then + 1 (homogeneous) parameters 
of the transformation. 

If p is not an integer, it is necessary to specify which branch of the integrand 
is intended. In order to avoid ambiguity in this case we shall restrict the param- 
eters a; to real values and the parameter ¢ to non-real values and then take the 


Received November 18, 1957. 


509 





510 J. G. MAULDON 


principal value of the logarithm in the formula (t — \)~” = exp (—p log (t — A)). 
It is easily seen that the integral (2) converges for all such values of t and a;. _ 
Now suppose that there is a set of r + rn real constants p; (>0) and c’ 


(t= 1,---,r;7 = 1,---,m) such that 
(3) taltsar, --+50) = I (¢- Dat)”, 
where, taking a; = 0 (j = 1, --- , n), it is clear that 
(4) 2° Dm. 
Derinition. Any joint distribution of z', --- , 2* which satisfies (3) (where 


¢, is defined in (2)) will be called an n-dimensional £-distribution with indices 
fi, °*:, pr. Then X r matrix C = [c?] will be called the coordinate matriz and 
the columns c; (i = 1, --- , r) of C will be called the vertices’ of the 6-distribu- 
tion. Lastly, the sum (4) of the indices will be called the exponent of the B-dis- 
tribution. 


2. Main results. 

THEOREM 1. For any given set of r + rn real constants p; (>0) and c; (i = 
l,---,7;j = 1,-+-, ), there is exactly one joint distribution of the variates 
z', +++, 2” which satisfies equation (3) (subject to (2) and (4)). 

THEOREM 2. Let D be the convex hull of the r points whose vector coordinates are 
c;, suppose that D is of dimension d (1 S d S min (r — 1, n)), and let € be the 
d-dimensional hyperplane containing D. Then any B-distribution with vertices c; 
is a continuous distribution over © with positive d-dimensional density at all in- 
terior points of D and zero density elsewhere. 

Further if p; 2 1 (all i) the density of the distribution is bounded, and if p; > 1 
(all i) the density is continuous over ©. 

The proof (Section 7) of Theorem 2 will show that if d = 0 the distribution is 
concentrated at a single point. Conversely, it is clear that any distribution which 
is concentrated at a single point is a 8-distribution all of whose vertices have the 
same coordinates as the point, and whose exponent is arbitrary. On the other 
hand (Section 12) we shall prove 

THEOREM 3. Any §-distribution which is not concentrated at a single point admits 
only one possible value for tts exponent, 
which, from (2) and (3), has the immediate 

Coro.uary. Any 8-distribution which is not concentrated at a single point has 
a unique set of distinct vertices, and its corresponding indices are also uniquely 
determined. 


3. The univariate case. A preliminary lemma. If n = 1 we may drop the 





1 Thus (e.g.) two identical columns of C will be regarded as defining two coincident ver- 
tices of the distribution, each with its appropriate index. 





BEVA-DISTRIBUTION GENERALIZED 


superscript 7. Then, writing ¢ for t/a; , we find that (3) may be written 


(5) E{(t — x)?} = Il (t — ¢;) (im t ¥ 0). 


Applying Theorem 1 of [3], we deduce that, if F(x) is the distribution function 
of a unidimensional 8-distribution, then, for almost all values of x and y, 


e 1 io 2 
(6) F(y) — F(x) = Oni [xo dt, 


where 


(7) K(t) = {1 (t.y r I (=2)} 


and the integral in (6), taken along the imaginary axis, converges (in particular 
att = 0). 

Now if y > x > max ¢;, the integrand K(t) is a regular function of ¢ in the 
region re t > 0, and so the path of integration in (6) may be deformed into the 
infinite semicircle +i in this region. Hence, since |K(t)| = 0 (|t{~*), it follows 
that in this case (i.e. for almost all z and y such that y > x > max c;) F(x) = 
F(y) = lim,.« F(y) = 1. Using the monotonicity of F(z), it follows that 
F(x) = 1 for all z > max c;, and similarly we may prove that F(z) = 0 for 
all x < min c;. This yields immediately 

LemMA 1. Any unidimensional B-distribution is bounded. 


4. The invariance properties of the class of 8-distributions. Suppose that the 
n-dimensional vector-variate x is a 6-variate with vertices c, , --- , c, and indices 
Pi, °°: , p. Regard x = {xt as a column-vector and a = {a,} as a row-vector. 
Then (3) may be written 


(8) E\(t — ax)~?} = [] (¢ — ac)” (im t + 0). 
i=l 

Let M be any m X n matrix with real coefficients and v any real m-dimensional 

column-vector, and define the m-dimensional column-vectors — and y; by means 

of the equations 


(9) — = Mx+yv, vy: = Mc; + Vv (¢ = 1,---, Fr). 


Next, let @ be an arbitrary real m-dimensional row-vector and 7 an arbitrary 
non-real scalar parameter, and write a = aM, t = 7 — av. Then it follows that 
7 — a& = r — a(Mx+v) =?t — ax 

(10) , 
rT— ay: = r— @(Me;+v) =t—ac, (¢ = I,---, 7) 
Comparing (10) with (8), it follows that &, defined in (9), is an m-dimensional 
8-variate with vertices y:,--- , y, and indices p,,---, p,. 
In particular, taking v = 0, we deduce 
Lemma 2. If there exists a 8-distribution with indices p, , --- , pr and coordinate 





512 J. G. MAULDON 


matrix C, then there exists a B-distribution with indices p, , +--+ , p, and coordinate 
matrix MC, where M is any real matrix such that the product MC is defined, 
which has the immediate 

Corouuary. If there exists a B-distribution with indices p, , --- , p, and coordi- 
nate matrix I = I;,) = the unit matrix of order r, then there exists a 8-distribution 
with indices p,, +++, pr and coordinate matrix M, where M is any real matrix 
with r columns. 

It has been proved that the class of 8-distributions is invariant under the trans- 
formation (9), for any real m X n matrix M and any real m-dimensional column- 
vector v. This transformation includes the following particular cases: 

(i) M non-singular. This, the general non-singular linear transformation, 
amounts to making an arbitrary choice of cartesian coordinate axes (not neces- 
sarily rectangular). It may also be regarded as a translation, dilation and gen- 
eralized “rotation” of the distribution, the axes being kept fixed. 

(ii) M = H , v = 0. This is the process of embedding the 8-distribution in a 
space of higher dimension, giving, of course, a singular distribution in the new 
space. 

(iii) M = [1 0}, v = 0. This takes the marginal distribution of a given set of 
the original joint variates, and is a particular case of 

(iv) a parallel projection, for which M* = M, v = 0. (In order to include (iii) 
in this case, we must regard the marginal distribution as a singular distribution 


in the whole space, which is done by adding n — m rows of zeros to the matrix 
M = [1 0)}). 


5. Boundedness and uniqueness. Case (iii) of Section 4 shows that the marginal 
distribution of any single component x’ of an n-dimensional B-variate x is itself 
a unidimensional #-distribution and hence (by Lemma 1, Section 3) it is a 
bounded distribution. But this implies that the distribution of x is itself bounded, 
and hence we have 

Lemma 3. Any 8-distribution is bounded. 

Write (3) in the form 


(11) B{(1 - > bya’) "} = (1 ae biel) 


j=l ) i=l j=l 


where b; = a;/t. Then, for sufficiently small values of max |b;|, each side of (11) 
can be expanded as an absolutely convergent series in powers and products of 
the b; . Equating coefficients determines all the moments’ of the joint distribu- 
tion of z', --- , x". Applying Lemma 3 and using the fact [5] that a distribution 
known to be bounded is uniquely determined by its moments, we have 

Lemna 4. If there exists any B-distribution with a given set of indices and vertices 
(i.e. any distribution satisfying (3)), then there is only one such distribution. 


2 e.g. the mean of a 8-distribution is the centroid of masses p; at the vertices ¢; . 





BETA-DISTRIBUTION GENERALIZED 513 


6. Basic 8-distributions. Consider the (singular) joint distribution of z', --- , 
x’ which is distributed over all positive values of the variates such that a + 
x +--+ +2" = 1, with density proportional to []ja: (z’)”*’, where 71, «<> 
Pr are given positive constants. Formally we have 
(12) dF = C(zx')?*™"(a")?** «++ (27)? da’ da® «++ da’ 
over the range 


(13) 2>O0Gj=1,-:-,r, DLwv=1, 


j=l 


where C = C(p;,---, pr) = T(p)/]] T(p;) (writing, as usual, p = >> p;). 
For this distribution, choosing p = 2) p; as the exponent of the fundamental 
transformation, it follows from (2) that 


@ = p(t; a1, «~~ ,a,) 
(14) - \ _ ee es 1 
-cff--[(t-Zax) TT (’) dz’ dz* -++ dx 


where (13) defines x” in terms of z', --- , x” and the integration is extended 
over all positive values of the variables such that z' + 2° + --- + 2°" <1. 

Hence, if d¢ (m,, m2, --- , m, ; 0) denotes the result of differentiating ¢ m; 
times with respect to a; (j = 1, --- , r) and then putting a; = 0 (j = 1, --- ,r), 
we have, on writing m for }-m;, 


d¢(m, ---, m,;0) 


(15) _ CT(p +m) ee 3 
v0 BGeat []- fe I] (24)? dz’ «+ dex 


j=l 


taken over the same range as the integral (14). This yields at once 


(16) ag (m:, ms, «++, mp0) = [] Leet) pom 
jt = P'(p) 
On comparing these partial derivatives (for all values of the m;) with the cor- 
responding derivatives of the function on the right-hand side of (17) below, 
and using the principle of analytic continuation, we reach the conclusion that 


(17) bolts a, --+,a) = TT (t — a) 
ea 
and hence’ we have proved 
Lemma 5. The distribution defined in (12) and (13) is an r-dimensional B-distri- 
bution with indices p, , «+: , p, and coordinate matrix1 = 1;,) = the unit matriz 
of order r. 
It is convenient also to adopt the 


’ This proof, suggested by a referee, is much shorter than my original proof. This ac- 
counts for the absence of equations numbered (18) to (25). 





514 J. G. MAULDON 


Dertinition. Any 8-distribution whose coordinate matrix is a unit matrix will 
be called a basic B-distribution. By Lemma 4 of Section 5 it is clear that any basic 
8-distribution is defined by (12) and (13) for some set of indices p; ,--- , p, . 


7. Proof of Theorems 1 and 2. By Lemma 5 and the Corollary to Lemma 2 
(Section 4), there exists a 8-distribution with any specified set of vertices and 
indices. By Lemma 4 (Section 5) this distribution is unique, and so we have 
completed the proof of Theorem 1. 

The truth of Theorem 2 for basic 8-distributions follows from (12) and (13). 
Using the Corollary to Lemma 2 (Section 4) we see that any 6-distribution is 
obtained from a basic 6-distribution by a transformation of the type (9)—indeed, 
we may take v = 0. Hence it only remains to prove that the properties of a 8-dis- 
tribution asserted in Theorem 2 remain invariant under any transformation of 
type (9) with v = 0. 

Now the ordinary theory of canonical matrices shows that any matrix M may 
be expressed as a product PABQ, where the matrices P, A, B, Q correspond to 
the particular cases of (9) described in (i), (ii), (iii), (i) of Section 4, and it is 
easily seen that the properties in question remain invariant under the transforma- 
tions (i), (ii) and (iii) of Section 4. Hence it is true that the relevant properties 
remain invariant under the transformation (9) (with v = 0), and so the proof 
of Theorem 2 is complete. 


8. The univariate case. Analytic continuation. In Sections 9-11, I shall prove 
the three theorems enunciated in this paragraph, with their corollaries. 
TuHeoreM 4. Let F(x) be the distribution function of a unidimensional B-distribu- 
tion with vertices ¢, , --- , Cr, indices p,, +--+ , pr, and exponent p = pi , and 
suppose that r = Zand cq < @, <--- <c,. Then, for eachvalueofk(l1 sks 
r — 1) there exists a unique function G,(z) such that 
(i) Ge(x) = F(x) fore, < x < Chai, and 
(ii) G,(z) ts a regular function of z throughout the complex plane, cut along the 
real axis from — « to c, and from C4, to +™. 
Further properties of the derivative Gi(z) of G,(z), and of its pth derivative 
??(z) when p is an integer, are given in 








THEOREM 5. 
(i) [fr = 2, 

i oe I'(p) (2 — e1:)(e2 — z)\"" os Pile — »\~P3. 

(26) Gi(z) = coubtan ( ag 1 #)) (z C1) (C2 2) ’ 


(ii) If p is an integer (e.g. p = 1), 


r 


k 
(27) G{?(z) = (—1)?"(p — 1)! 9" sin aw T] (2 — ¢;)°”* [TI (ce, — 2), 
j=l jmk+1 
where a = > a Di. 
The asymptotic properties of the function G,(z) are described in 





BETA-DISTRIBUTION GENERALIZED 515 


THEOREM 6. If the z-plane is cut along the real axis from — ~ to c, and from 


Ce41 lo +, then, as |\z| > ~, 
(28) Gi(z) ~~ Ax(z £8 ce) "(Cras A 2)? 


where A, > 0, P = > ja1 7,Q°2= Diet pi, and we use that branch of the 
function which is positive for q% < z < Cry. 

Corotuary 1. Gi(z) ~ B, 2” as |z]} + ©, where B, ~ 0. 

CoROLLARY 2. Theorem 3 is true in the univariate case. 

Corouuary 3. If F,(x), F2(x) are the distribution functions of two unidimen- 
sional B-distributions and if, at an infinity of distinct points x, either Fi(x) = 
AF3(x) ¥ 0 or F\(x) = AF;(x) + B ¥ 0 or 1, then the two B-distributions have 
the same exponent. 

I have been unable to prove an analogue of Corollary 3 in the multivariate 
case, and I therefore confine myself to offering the 

Consecture. If two n-dimensional B-distributions have the same non-zero 
d-dimensional density throughout some d-dimensional region, then the two 8-dis- 
tributions have the same exponent. 


9. Proof of Theorem 4. Let F(z) satisfy the conditions of Theorem 4, and 
select an integer k such that 1 S k Sr — 1. 

We shall have occasion to consider two complex variables, z and t. The z-plane 
will be cut along the real axis from — © to c and from c+; to + ©, and (for any 
given value of z) the é-plane will have straight cuts from the origin t = 0 to each 
of the points ¢ = c; — z(j = 1, -+-,1r). 

Let I, be a contour in the cut ¢-plane which starts and finishes at t = 0 and 
encloses (positively) all the points c; — z (1 S j S k) and none of the points 
c; —z(k+125j Sr). We now define 


(29) Gle) = hf te 6) a, 


k j=l 


using that branch of the integrand (in the cut t-plane) which ~f"' as |t| — «. 
Clearly G,(z) is a regular function of z in the cut z-plane. 

Now the distribution F(z), being a 8-distribution, is bounded, and so we may 
apply equation (4) of [3] and indeed, if ec < 2 < ck4:, the path of integration 
may be deformed into T,. Comparing with (29) we find F(z) = G,(z) for 
almost all values of « such that c, < x < cx4,. Using the fact that G.(z) is con- 
tinuous and F(z) is monotonic, this result may be extended to all values of x 
in the range (ck, Ck4:), and so G,(z) is the required analytic continuation of 
F(x2)—it is certainly unique, by the elementary theory of analytic continuation. 


10. Proof of Theorem 5. 

(i) The case r = 2. Starting from the basic 6-distribution defined in (12) and 
(13), we make the substitution « — c, = (c: — ¢)2°, so that eo — x = (e — q)2’. 
This yields (26), which therefore gives the density over (c; , c2) of the unidimen- 
sional 6-distribution with vertices c, , cz and indices p; , pe . 





516 J. G. MAULDON 


(ii) The case p = 1. It is convenient to write 


= M(t,z) = ]] (¢+2—¢). 
j=l 
Then we have, from (29), 


1 orl on l orl 


(30) ue Dri ry Oz Qei , ot 


~ (lr, 
which is the difference between two branches at ¢t = 0 of the function I(t, z). 
On evaluating this difference we reach (27). 


(iii) p an integer > 1. Integrating by parts, we find 


aml 
a ode. 


(31) Gi(z) = dt = to I(t, z)t”~ dt. 

2mi Jr, 
Repeating the process p times and using (30) at the last step, we again reach 
(27) in this more general case. 


11. Proof of Theorem 6 and its corollaries. 
(i) The case p > 1. Consider the transformation t = w defined by 


(32) t= —2°/(w +2). 


The points ¢ = c; — z are transformed into points w = c; + O ( |z|"') and 
the point ¢ = 0 goes to infinity. Notice also that I’, leaves the first k singularities 
of the integrand on the left, and the remainder on the right—hence the same is 
true of the path of integration for w. It follows that the substitution of (32) in 
(31) (which still holds) yields the result, valid for all sufficiently large values 
of |2|, 


100 r 


(33) Gi(z) = (l= Le | I] (ce; — w + c;wz')~” du, 


10 jal 





where there are cuts in the w-plane approximating to the segments of the real 
axis from — © to c and from ¢ 4; to +, and we use that branch of the inte- 
grand which has the value z ” when w = —z. 

By splitting the range of integration into three parts, defined by |w| 2 |z|"’, 
it may be seen that the correct asymptotic value, as |z| — ©, for the integral in 
(33) is given’ by substituting 2’ = 0. Making this substitution and paying due 
regard to the appropriate branch of the many-valued functions involved, we 
find that, if p > 1, (33) yields (28), provided that 


(34) A = 2 on 2 7 Tl (w — c;) TL, (c; — w) ”! dw 


10 j=l 





where the w-plane is cut along the real axis from — © to c, and from c,41 to +. 
* More generally, an asymptotic series for (33) is given by expanding the integrand as a 
power series in z~' and integrating term by term. 





BETA-DISTRIBUTION GENERALIZED 
It is convenient to write (34) in the form 
(35) A, = Coes / e*™ dw, 
21 
where the contour extends to + 7 in the cut w-plane, and 


h(w) = Soja: pj log (w — ¢;) + Djnvss py log (c; — w), 


taking real values if ¢. < w < Cry. 
The integral (35) possesses a unique saddle-point w = w, in the cut plane, 
defined by h’(w.) = 0, i.e. 


(36) : = = 0, Ck < We < Cr4i- 
jul We — Cj 

Taking the integral along the curve of steepest descent from this saddle-point 
we have, on the path of integration, im(h(w)) = 0. Hence the integrand is posi- 
tive and decreasing away from the saddle-point, and this immediately yields the 
required result A; > 0. 

By differentiating (33) and substituting from (34) we reach the further result, 
if p > 1, p ¥ 2 and |z| > o~. 


” — 2 —_= Dus 
(37) G; (z) ie PR —- A(z — cr)? ‘(Cea v z)’ ' (Ay > 0) 


(ii) The case p < 1. We first establish 
Lemma 6. If Ay = Ax(C1, +++ , Cr 3 Pr, °** , Pr) t8 defined by 


(38) Ay = limyso0 (2 — ce) “(eras — 2) Giz), 
where P and Q are defined in (28), then, if 0 < p <1, 


at REALS D-ring = Mice ae 


Be, 3e, 7 OE RAD, o 54), 


the notation implying existence, reality and finiteness. 

Proor. Consider the case 1 S s S k — 1. Let F*(z) be the §-distribution with 
vertices and indices cj, p;, where c} = c; (allj), p; = pj (j ¥ 8), pe = p. + 1, 
so that P* = P + 1, Q@* = Q, p* = p+ 1,1 < p* < 2. 

Then, using (31) and (29), 


ne 1 — »* 
Gi (z) = t 


[ (t+2-6)7 [I (t-+2— 6) 77 dt = —2& 
rT, j=l DP. OC, 


Qni 


Differentiating with respect to z, multiplying by (2 — cx)’ “(er4,: — 2)’, 
letting |z| +> © and substituting from (38) and (37) yields (—p/p,) (@A:/dc.) = 
—(p* — 2)Ar = (1 — p)Ar > 0, as required. The corresponding result when 
k + 2 < 8 S ris proved in a precisely similar manner. 





518 J. G. MAULDON 


From Lemma 6 we deduce immediately 
Ax(Ce, °** Ck yCetay *** » Chat; 
(39) 
x °°" » Pr) < A.(er, “*\< se See *** 5 Dr) ~ &, 


where, on identifying coincident vertices, we have 
(40) Arlee, +++ 5 Ce, Char, *** Char 3 Pry°** » Pr) = Ale, Ces: ; P,Q), 


dropping the subscript k from A; in the case r = 2. 
Now Theorem 5 (i), in conjunction with (38), shows that 


A(ex , Ces 3 P,Q) = (Cras — ce) 0 (p)/(T(P)T(Q)) > 0, 


which, with (39) and (40), proves that Ai(ci:,--+, Cr; i, °°, pr) > O if 
0 < p < 1. Since the other cases have been dealt with in Section 11 (i) and 
Theorem 5 (ii), this completes the proof of Theorem 6. 

(iii) Proof of the corollaries. Corollary 1 is immediate. To prove Corollary 2 
we remark that if G(z) is any nontrivial analytic continuation of F(x), then, by 
Corollary 1, the limit lim,,;.<. (log G’(z)/log z) exists and is equal to p — 2. 
Hence this limit uniquely determines the value of p. 

Corollary 3. Since a 8-distribution has only a finite number of vertices, an 
infinity of the given points z must occur in an interval lying between two con- 
secutive vertices of each 8-distribution. The corresponding analytic continuations 
bear a simple relation to each other, and determine the same value for p. This 
proves Corollary 3. 


12. Proof of Theorem 3. If F(x', --- , x") is a 8-distribution which is not con- 
centrated at a single point, then (by Section 4 (iii) ) the same is true of the margi- 
nal distribution of at least one of the individual component variates 2’, --- , x”, 
and furthermore the resulting unidimensional f-distribution has the same ex- 
ponent as the original. Theorem 3 now follows immediately from Corollary 2 
to Theorem 6 (Section 8). 


13. Examples of 8-distributions. In addition to the basic 8-distributions defined 
in section 6 the following examples may be mentioned. 

(i) Starting with the 2-dimensional basic 8-distribution, and using the trans- 
formation (iii) of Section 4, we take the marginal distribution of z'. Writing z, 
p, q for z’, ~: , p2, we find that this yields the familiar 6-distribution 


dF = Cx” "(1 — x)" de (O<2x< 1). 


The case p = g = } is known [7] as the “Arc Sine Law”’. 
(ii) Taking a fixed non-degenerate n-dimensional simplex in n-dimensional 
space as the simplex of reference, and an arbitrary interior point as the unit- 


point, we may establish a system of n + 1 homogeneous coordinates £", -- - 
aia 


. Writing x’ = ¢’/>°£', we have a generalization of 2-dimensional “areal” 
coordinates, subject to the identical relation }>z’ = 1. Then a distribution ex- 





BETA-DISTRIBUTION GENERALIZED 519 


tended over the interior of the simplex of reference with density proportional 
to [](2’)?*" (p; > 0) is a B-distribution, since it is obtained from the (n + 1)- 
dimensional basic 8-distribution by the transformations (iii) and (i) of section 4. 

(iii) A particular case of (ii), given by taking p; = 1 for all j and using the 
transformation (ii) of Section 4, is a uniform distribution over any non-degenerate 
simplex in space of any dimension. 

(iv) An explicit expression for the unidimensional §-distribution with indices 
Pp, q, r and vertices a, b, cla < b < c) is most easily obtained as the marginal 
distribution of the corresponding 2-dimensional distribution over the non- 
degenerate triangle with vertices (a, 0), (b, h), (c, 0). Using a result given on 
p. 293 of [6] we find that, in the range a < x < b, the density of the distribution 
is given by 


dF _V(p+q+tr) (x — a)” (ce — 2)” oe 
az ~ T@ratn O-ate—ara PU paGat ru) 


where u = [(c — b)/(b — a)][(a — a)/(e — x)] and F(a, 8; 7; t) is the ordinary 
hypergeometric function. There is, of course, a similar formula applicable in 
the range 6 < ¢ <¢. 

The symmetry of the original definition is restored by observing that in any 
range the density can be expressed in terms of a solution of Riemann’s P-equation 
as follows: 


‘ s. 8-2 ) 
dF p q r 
— « (x —a)"(x— b)*(a —c)'Pi a B y x> 


‘ 


dz ie" p ¥ 


provided thate’ + B8+y=patht+y=gatBt+y =rat+sB+y= 
p+aqtr— il. 

It should be noticed that, if p = ¢ + r = 1, this distribution is uniform over 
the interval (a, b), but not over the interval (b, c). Hence this gives a nontrivial 
example of Corollary 3 to Theorem 6 (Section 8). We may also specifically men- 
tion the case p = q = r = 1, which gives 

(v) the ordinary (univariate) triangular distribution. 

(vi) Certain distributions arising in connection with the random division of an 
interval, such as that studied by Fisher [2] in 1929, are 6-distributions—indeed, 
several investigations of this problem ([4], [1], [3}]) take (3) as their starting-point 
(with integer values for the p;). Another approach, adumbrated in the summary 
(Section 0) is to apply the transformation (iv) of Section 4 to the case (iii) of 
this paragraph, the transforming matrix M being of unit rank. 


REFERENCES 
[1] D. E. Barton anp F. N. Davin, “Sums of ordered intervals and distances’’, Mathe- 
matika, Vol. 2 (1955), pp. 150-159. 


{2} R. A. Fisner, ‘‘Tests of significance in harmonic analysis’’, Proc. Roy. Soc. Edinburgh 
Sec. A, Vol. 125 (1929), pp. 54-59. 





520 J. G. MAULDON 


[3] J. G. Mautpon, ‘‘An inversion formula for a generalized transform’’, Mathematika, 
Vol. 4 (1957), pp. 146-155. 

[4] J. G. Mauutpon, “Random division of an interval’’, Proc. Cambridge Philos. Soc., Vol. 
47 (1951), pp. 331-336. 

(5] J. A. SHomat anv J. D. Tamargin, The Problem of Moments, Amer. Math. Soc., 1943, 
p. ll. 

(6) E. T. Warrraxer anp G. N. Watson, Modern Analysis, Cambridge University Press, 
1935. 

(7] W. Fe.zzr, Probability Theory (Vol. 1), John Wiley and Sons, New York, 1950. 





DETERMINING BOUNDS ON INTEGRALS WITH APPLICATIONS 
TO CATALOGING PROBLEMS' 


By Bernarp Harris? 
Stanford University 


1. Introduction and summary. Assume that a random sample of size N has 
been drawn from a multinomial population with an unknown and perhaps 
countably infinite number of classes. 

Hence, if X; is the jth observation, and M; the ith class, then 


{X,;eMji =p, 20 a= 1,2,--- ; forall, 


and >-f p; = 1. 

If the number of classes is finite, then p; = 0, for all 7 > S, where S is the 
number of classes. 

We do not suppose the classes to have a natural ordering, since the classes 
may be species of insects, or chess openings. 


Let n, be the number of classes which occur exactly r times in the sample. 
Then 


« N 
> rn, = N and d= 2m 
r=0 ros 
where d is the number of distinct classes observed in the sample. 
It is the purpose of this paper to present some techniques to aid the experi- 
menter in answering the following kinds of questions. 
1) Prediction of the number of distinct classes that will be observed in a 
second sample of size aN, a 2 1. 
2) Prediction of the number of additional classes that will be observed when 
the sample size is increased by (a — 1)N additional observations. 
3) Estimation of the coverage of the sample, where coverage, denoted by C, 
is defined as follows: 


(1) C= Di pi. 
J 


The sum is to be taken over those classes for which at least one representative 
has been observed. 

4) Prediction of the coverage of a second sample of size aN. 

5) Prediction of the increased coverage to be obtained when the sample size is 
augmented by (a — 1)N additional observations. 

Received May 20, 1958; revised January 9, 1959. 

1 This research was supported in part by the Office of Naval Research under Contract 
Nonr-225(21). Reproduction in whole or in part is permitted for any purpose of the United 
States Government 

2? Now at University of Nebraska. 





522 BERNARD HARRIS 


Let d(a) and C(a) denote the number of classes and coverage obtained from 
aN observations, either in the case of a second sample, or an augmented sample. 
Subsequently, we will show that 


1 a e ene) 
(2) Ed(a) ~d+m pine 5 
/ 
and 
‘ , nm UN a —(a@—1)z 
(3) EC(a) m1 - G+ Ell —« gs 


where the expectations on the right hand side of (2) and (3) are taken with 
respect to F(x), a constructed cumulative distribution function on [0, ~), 
which is unknown to the experimenter, but estimates of the moments of this 
distribution are available. 

It will be shown that 
(4) — (r + 1) !(mr41) 

ny 
where yp, = fo x dF (zx). 

Hence, a reasonable procedure is to compute upper (lower) predictors for 
d(a) and C(a) by computing the supremum (infimum) of (2) and (3) respec- 
tively, where the supremum (infimum) is taken over all cumulative distribution 
functions whose first k moments are specified by (4). 

It will be shown that 


1 ie g wus 


g(x) =. a and y(x) = 1- eee 


may be treated identically in the computation of extrema with the observation 
that the computation of the supremum of E{¢(xz)} will be identical with the 
computation of the infimum of E{y(2)}; and similarly for the computation of 
the infimum of E{g(x)}. Thus, we will restrict the discussion to ¢(z). 

The solution will be of the form 


(5) sup(inf) Ef{g(z)} = > A; ¢(2,) 
i=l 


where A; = 0, >-7-1; = 1 and 2; belongs to the extended non-negative real 
numbers. 
To determine the z; and \,; , the following system of equations must be solved 


Ait + Aste + +++ + APA = 
(6) Mi + dors + +: + eae = Me 
Nari + dete tes +A = 


where 7} S %2 S-:: Sr. 


lA 





DETERMINING BOUNDS ON INTEGRALS 523 


If k = 2c, c an integer, the supremum will be obtained by solving (6), with 
a = 0,r S (k + 2)/2, and x, --- , z, interior points of [0, ~). If k = 2c + 1, 
the solution of (6) will coincide with that for k — 1 with the addition of a mass 
point at infinity, with mass tending to zero at a rate which will satisfy the kth 
moment constraint. Since ¢(*“) = 0, no change in (5) is obtained by the ad- 
dition of the kth moment constraint. 

If k = 2c + 1, the infimum is obtained by solving (6) with r S (k + 1)/2 
and the z; are all interior points of [0, ~ ). If k = 2c, c > 0, the solution of (6) 
is obtained from that for k — 1 by the addition of a mass point at infinity with 
mass tending to zero at a rate which will satisfy the last moment constraint. 

Explicit solutions are computed for k $ 3, and applied to several examples. 

In addition, the low order moments of d, n,, and C are computed, and the 
asymptotic sampling error of C(1) = 1 — (m/N) as an estimate of sample 
coverage is given. 

It will be convenient before proceeding to the general problem to compute 
the low order moments of d, n, , C. 


2. Moments of d, n, and C. To compute Ed and Ed’, define a sequence of 
random variables { Y ;} as follows: Let 


yo Es if jth class occurs in sample, 
* . a otherwise. 


Then Ed EX}. Y;= ;- EY;. Since EY; = 1 — (1 — p;)*, we have 
(7) E(d) = D/{l — (1 — py)". 
Similarly, 


2 2 « 
Ed E(d v;) =E (= a) +E (2 Y; Y;) = d (1 — (1 — p;)*] 
i= i= tj j= 


(8) 
+ » (1 — (1 — p;)*” — (1 — ps)” + (1 — pi — py)”. 


Thus, 
o, = Ed’ — (Ed)’ 


= > U1 — (1 — p;)*] + 2d, (1 — (1 — p,;)* — (1 — p,)" 


j=l 


+ (1—p:i—p;)")- par —(1- pi)"1) 


~ 2» (1 — p: — ps)” — (1 — py) "(1 — ps)" 





524 BERNARD HARRIS 


= 2 [1 — p,)* — (1 — 2p,)") 


+ 2 [(1 - B= p;)” ee (1 Pa pi)*(1 7 p;)”). 


It is easily seen that the second term in o@ is always negative. 

For, when N = 1, (hi py pi) — (1 — pi)(1 — pj) = —pip; < Oimplies 
(1 — pi)"(1 — p;)* = (1 — pi — pj)”. 

To compute En, , define random variables Z;” as follows: 


(r { fe s ° ° 
Z;” = f1, if jth class occurs r times in sample, 
0, otherwise. 


Clearly, 

E(n,) = > EZ§”, E(zZ§?) = (*) pi(1 — p;)*~ 
Hence, 
(10) E(n,) = > (3 ') oF ar’ 


To find o;,, and cov (n, , n,), we first compute E(n, , n,). 


E (> Zz” ye z;°) 


E(n, ns) 


t=1 j=l 
N! na es 
5" : ‘ (1 = 4 j f +t 
(11) ;rlsl(N —r—as!? Pj Pp Pj) if r 8 
-2(*)» 5(1 — p;)” a 
j=l \T 
N! , r/ \ N—2r en 
- x (r!)2?(N — 2r)! Ps Pit — pi — pi) lfir = s. 
Thus, 
' 
cov(n,, Ns) sees N! pip; 
i=l j=l ris! 
(12) (a — pi— pi)" " _ NX(1 — pi)" “(1 — py)’ =f 
(VN —r-—s)! (N — r)\(N — 8)! 
a) TY 


imi r!isl(N —r— s)! 





DETERMINING BOUNDS ON INTEGRALS 


" _ Fee 2(N — 2r)! D3 pi(l — ps — pj)” 


a 78) 
- ae p: pj(1 — ps)" — py | 


> NM! oe (UL — pi — pi)” _ Ni — p)* "(1 — ra”) 
(18) = 2» (r!)2 " pi (N — 2r)! (N-—r)? 


+E [(% ) 9a pi)" - my PF — ap)" |. 


To compute EC, the random variables Y; are used in the computation of Ed 
are again employed. 
Then, we have C = )>-3, p;¥; and 


(14) EC = Do psEYs= 2 pill — (1 — ps)"1 = 1 — Devil — py)”. 


Similarly, 
-E(Sp} 3) = = ECD. pi p; Y; Y3) 


= Dri - = p)") + D ppl - 1 - p)” 


j=l 
— (1— p,)* + (1 — p; — p;)*) 
= Dail (1 — ps)* — (1 = 2p;)"] + 1 = 2 2 pd — py)” 


+ 2p pil — pi — p;)* 


2 = EC’ — (EC)*® = EC’ — E —- > vfi-— | 
j=l 


(16) — 9 Ny . 
= DY pl — py” — (1 = 2p;)"1 + Xp psl( — ps — ps) 
j=l re) 
— (1 — p;)*(1 — p;)*). 
As in the case of oj, the second term in a¢ is always negative. 
In Appendix A, it is shown that for large N, satisfactory exponential approxi- 


mations may be used for Ed, En,, EC. Employing these, we obtain for Ed, 
En, , and EC: 


(17) E(d) ~ Do fi — &*"4] 
j=l 





526 BERNARD HARRIS 


(18) E(m) ~ “ > (Np)'e"! 
+ j=l 
(19) E(C)~1— Di pe”. 
j=l 
From (16), we get 
(20) B(C) ~ 1 — Bem), 


Henceforth, these approximations will be employed throughout. 
It should be noted that the first two moments of n, and (20) are obtained in 
papers by Good [3] and Good and Toulmin [4]. 


3. Prediction of number of classes observed and coverage obtained in en- 
larged and second samples. From (7) and (17), we have 


(21) E(d(a)) = 2» (1 — (1 — p;)™) 
and 


E(d(a)) ~ 2 [i — ¢ Pi) 





(22) ; 
= E(d) + De" — & 4), 
3 
Then 
1 a go Uae: . 
(23) E(d(a)) ~ E(d) + yt — Np; ¢" |. 
J . 3 
In exactly the same manner, 
(24) E(C(«)) =1— 2d pi(l — p;)" 
j= 
and 
E(C(a)) ~1—- 2X (pie aNPi) 
j= 
(25) =1— D [pe + pe"? — pe” 
d 
= E(C) + Do fi — ope. 
3 
Then 


aa —(a—1) Np; 3 
(26) E(C(a)) ~ E(C) + > [= — | vp, em, 
j 


N 





DETERMINING BOUNDS ON INTEGRALS 527 


Formulas (23) and (26) have an interesting interpretation. The second term 
on the right hand side in each case is the expected increment when the sample 
size is increased by (a — 1)N; or equivalently the expected increment over 
the number of classes, or coverage of the first sample, that will be obtained in a 
second sample of size aN. 

Define 

S Np et 
el _ Npjse 
(27) F(c) > Noe 
2 


One readily observes that F'(c) is a cumulative distribution function, and since 
it depends on the unknown parameters (p;, pz2,-°--), it is unknown to the 
experimenter. 


Consider yu, the rth moment of F(c). 


item [ a’ dF (2) 


0 
» (Np;)"™ e NP 
Rg (es al a I PR 
2 Np; *? 
7 
Then, from (18), we have 
Bact (r + 1)! B( m1) 
(28) Mr E@) : 
Also, from (23) and (27), we have 
—(a—l)z 


(29) E(d(a)) ~ E(d) + E( m) [ A 8 gps) 
~ 0 


x 
and, from (26) and (27), 


E(C(a)) ~ E(C) +E il (1 — & “**) dF(2) 
4 0 


(30) E( E( ) > 
f ai + ah I (1 — e—*) dF (2). 


ae 


Replacing the expected values by the observed quantities in (28), (29) and 
(30), we obtain 


(31) m, = (r +1)! 


ny 


(32) d(a) =d+m E(¢(z)) 
. y ~ pi oe me i") 
(33) a)=1 W + V E(¥(2)) 


where the expected values are computed with respect to F(x) as defined by (27). 





528 BERNARD HARRIS 


However, since F(x) is unknown, a reasonable procedure is to compute the 
supremum and infimum of the expected values using the m, as estimates of the 
moments. 

Since this technique, the method of moment inequalities, is of application in 
other problems, we will obtain some general results relative to computation of 
extrema of expected values of functions with respect to unknown cumulative 
distribution functions. 


4. Method of moment inequalities. We proceed now to investigate the com- 
putation of extrema of expected values. The methods used in this section are 
similar to those used in Chernoff and Reiter [1] and Karlin and Shapley [6}. 

Let $ be the class of cumulative distribution functions on [0, ~ ), and $, that 
subset of § all of whose elements have F(b) = 1. 

Let gree *® (g ft“) be the class of cumulative distribution func- 
tions on [0, ~ ) ([0, b]) whose first kK moments are yw , ue, ~~ , we , respectively. 
We will suppose throughout that 4; , uw2, --- , we is a legitimate moment sequence, 
i.e., that there exists a cumulative distribution function F(x) ¢5(5,), whose 
first kK moments are m1, we, °** , Me. 

Let g(x) be a function continuous and bounded on (0, 5]. 

Designate the subset of Euclidean k + 1 space, whose coordinates are 
(fe g(x)dF (zx), i. Sere ur), F eS, by Xess . 

THEOREM 1. X;4; 78 closed, convex, and bounded. 

Proor: X;4; is clearly bounded. 

To demonstrate convexity, note that any convex linear combination of ele- 
ments of $, is an element of $, and the mapping F — (Eg(z), w1,°-°-, use) = 
7 -F is linear and thus preserves convexity. 

To see that X,,4,; is closed, note that $, is compact (in the topology of con- 
vergence in distribution). The conclusion follows upon application of the Helly- 
Bray Theorem. 

The point 


(min (max ) | g(x) dF(x), wir, m2, °°*, us) 
0 


FeF», 


is then easily characterized as the boundary point %* ¢ X;4; whose first coordi- 


nate is a minimum (maximum), with fixed second, third,---, k + 1th co- 
ordinates. 

We also remark that X;,4; is a k + 1 dimensional convex body as long as g(x) 
is not linearly dependent on the monomials 1, x, 2’, --- , x‘. Henceforth, we 


will assume that g(x) satisfies this condition. 

THEOREM 2. The extreme points of Xii1, k > 1, are exactly those points which 
correspond to the moment sequences of degenerate cumulative distribution functions, 
1.€., 


F(z) 


0, 2 </e, 


= 1, xz 2a,ae(0, dj. 





DETERMINING BOUNDS ON INTEGRALS 529 


Proor: One can easily show that any extreme point of X,4; is the image under 
T of an extreme point of & . It is easily seen that the extreme points of %, are 
the degenerate cumulative distribution functions. 

The points in X,4; corresponding to degenerate cumulative distribution func- 
tions have the form = (g(t), t, ?, --- , @), te [0, dj. 

Consider the hyperplane H(g(t), wu, +--+, ae) = t) — Quito + u2 = O where 
ty is fixed and fy « [0, b]. 

Then, 


(t — &)? = § — 2m +%>0 t # bo 

= 0 t= lo. 

Thus, t§ — 2y:to + ue = 0 is a supporting hyperplane at % = (g(t), &, te, °°, 

ts) and % is not attainable as a non-trivial convex linear combination of points 
of X.4; corresponding to degenerate distributions. 

Hence for k > 1, points in X;,4; corresponding to degenerate distributions 
are the extreme points of Xi4:. 

Corouuary. If g(x) is strictly convex (concave), the set of extreme points of 
Xiui, k 2 1, are exactly those points which correspond to the moment sequences of 
degenerate cumulative distribution functions. 

Since Z* is a boundary point of X;,4: , it can be represented as a convex linear 
combination of at most k + 1 extreme points of X;4: , i.e., 


k 
(34) 5 = Lda; = De Asl9(ts), ty Gs vss, G). 


Hence, the minimizing (maximizing) distribution is discrete with positive 
probability concentrated on at most k + 1 points in (0, }]. 

We will now show that the maximum number of points of positive probability 
can be reduced still further in the specific cases of interest to us. 

Since %* is a boundary point of X;,4:, there is a supporting hyperplane at 
%*, which also contains the extreme points of X.4; of which * is a convex linear 
combination. 


Thus, &-%* = cand @-% = ec,  ¢ X44, . In particular; 


k 
(35) Dd at’ + aryg(t) 2c for all ¢ ¢ [0, 6] 
t=1 


with equality holding for those extreme points of which %* can be written as a 
convex linear combination. Rewriting (35) we have 


- 
(36) P(t) = > at’ + onsg(t) = 0 t € (0, bl. 
t=0 


Then, to find the relevant extreme points, we have to find the roots of P(t) = 0, 
te [0, b). 

We note at once that all interior roots of P(t) are multiple roots, since by (36), 
P(t) = 0 for all ¢ e (0, bj. 

Let r be the number of distinct real roots of P(t) in (0, 5]. 





530 BERNARD HARRIS 


Define r’ as follows: 


r if 0, b are not roots of P(t) 
r=ir—4 if one of 0, b is a root of P(t) 
r—1 if both 0, b are roots of P(t) 


Let 9%” be the collection of continuous, bounded, and monotonic functions 
on [0, 6], whose first k derivatives exist and are monotonic in (0, b). In addition, 
we require that $;” contain only functions not linearly dependent on the mono- 
mials 1, ¢,?,---,¢ 

THEOREM 3. If P(t) = Q(t) + ag(t), where a is a real number, Q(t) is a 
polynomial of degree k, and g(t) ¢ Sf", then, there are at most k + 1 real roots of 
P(t) in (0, b). 

Proor: We proceed by induction. 

When k = 0, there is clearly at most one real root of P(t) in (0, b). 

Then, suppose the conclusion holds when k = n, n 2 0 for any function P(t) 
satisfying the hypotheses of the theorem. 

Then, if P(t) satisfies the hypotheses of the theorem for k = n + 1, Pi(t) 
satisfies the hypotheses for k = n. In addition, between every root of P;(t), 
there is a root of P:(t), but P}(t) hasat most n + 1 roots in (0, b), hence P1(t) 
has at most n + 2 roots in (0, b). 

THEOREM 4. If P(t) satisfies the hypotheses of Theorem 3 and in addition 
P(t) 2 0 for all t € (0, bj, then 
(37) rs a+ 


= 9 


= 





Proor: Let y be the number of distinct roots of P(t) at 0 and b, ie., the 
number of distinct boundary roots. 

Then r’ = r — (7/2). 

By Theorem 3, P’(t) has at most k distinct roots in (0, 6). Since P(t) has 
only multiple roots in (0, b), whenever P(t) = 0, te (0, b), we have P’(t) = 0. 

In addition, if t& , i(t < t) are two distinct roots of P(t), then there exists 
a t* such that P’(t*) = 0,4 < t* <t. 

Hence, (r — 1) + (r — y) Sk, (r’ + (y/2) — 1) + (r’ — (y/2)) SK, 
r’ s (k + 1)/2. 

Theorem 4 provides an extension of results contained in Chernoff and Reiter 
[1] and Rustagi [7]. The use of r’ is similar to Wald’s [8] notion of the degree of 
a cumulative distribution function. 

For any discrete cumulative distribution function F(z) ¢5,, let r’(F) be 
defined as follows: 

Let ri(F) = number of saltuses of F(z) in (0, b) 

2r2(F) = number of saltuses of F(x) at 0 or b. 
Then, 
r'(F) = ri(F) + (FP). 


r’(F) is called the degree of the discrete cumulative distribution function F(x). 





DETERMINING BOUNDS ON INTEGRALS 531 


Thus, we have seen that the degree of the minimizing (maximizing) distribu- 
tion ¢5,"'"?""""*” is at most (k + 1)/2 for functions in gf”. 

We now employ some results due to Wald [8] to establish the following theorem: 

Tureorem 5. There are exactly two cumulative distribution functions FT (x) 
and F3(z) esit'"?""'"” with r’(F) < (k + 1)/2 where Ff(x) is continuous at 
b, and F?(x) has a saltus at b. 

Proor: From Theorem 4, there exist at least two cumulative distribution 
functions of degree <(k + 1)/2. They are the minimizing and maximizing 
distributions respectively for functions in gf”. 

Let w(F) = r’(F) if F(x) has no saltus at b 

= r’(F) + 4 if F(z) has a saltus at b. 

Wald has shown the following: 

Let F(x) and G(x) be two cumulative distribution functions belonging to §. 

Then, 

A) If w(F) and w(G) are both <q, then F(z) — G(x) changes sign at most 
2q — 2 times. 

B) If w(F) and w(G) are both <q, q > 1; and both F(x) and G(x) have a 
saltus at a > 0, then F(z) — G(x) changes sign at most 2q¢ — 3 times. 

C) If F(z) and G(z) both es“'"*""""”, then F(x) — G(x) changes sign at 
least k times, unless F(x) = G(x). 

Now, suppose there exist two cumulative distribution functions, F(x) and 
G(x), both e 5{"""*""""*”, both of degree <(k + 1)/2, and both continuous at b. 


Then, by A, 


F(x) — G(x) changes sign at most 2r’ — 2 times. 
By C, 

F(x) — G(x) changes sign at least k times. 
But, by hypothesis 2r’ — 2 < k — 1. Hence, F(z) = G(z). 

Suppose there exist two cumulative distribution functions F(x) and 
G(r) esi"? both of which have a saltus at b, and both of which have 
degree <(k + 1)/2. 

Then, by B, 

F(x) — G(x) changes sign at most 2g — 3 times. 
By C, 


F(x) — G(x) has at least k changes in sign. 


But, 2¢q — 3 = 2r’ — 2 <= k — 1. Thus, F(z) = G(z). 

We note that B requires gq > 1, implies r > 4. However, when r = }, the 
theorem holds trivially by the monotonicity of g(x). 

This establishes that there are exactly two cumulative distribution functions 
egi’t42""""*) with degree <(k + 1)/2, one of which has a saltus at b, and the 
other is continuous at b. These are the minimizing and maximizing distributions. 

We can now obtain the following theorem. 





532 BERNARD HARRIS 
TueoreM 6. The degree of the minimizing (maximizing) cumulative distribution 
function, 


(38) (Fi _ tt ’ i= 1, 2, 


7 gi" want? th) 
F; €) E% b ' ? , 


bk min (max) [ « dF (x). 
0 


F eS yh gees ey) 

Proor: If in Theorem 5, we replace k by k — 1, then since z* ¢ 93”, there 
are exactly two cumulative distribution functions ¢ 5("!"*'""'“*-” with degree 
<k/2, and these distributions have the property that they determine the ex- 
trema of , for F(x) egf’it*-” 

Hence, we may conclude that if r’(F) < (k + 1)/2 and F(z) es" *”, 
then uw, is an extremum of E(X*), where the extremization is over all 
F(x) e5i""""""“*-Y Thus, the inequality of Theorem 4 is an equality, whenever 
ue is not an extremum determined by yw; , we, *-* 5 Mei - 

Corotiary. The max(min) uey.,v > 0, for F esi'*?"'"” is attained by 
choosing F(x) to be one of the two cumulative distribution functions with r'(F) s 
(k + 1)/2, one of which has a saltus at b, and the other of which is continuous at b. 

Thus, we have shown that this technique will enable us to obtain the sharpest 
possible bounds on higher moments given the first k moments. 

If an extremizing cumulative distribution function is of degree <(k + 1)/2, 
we will call the moment sequence (4, we, °*:, we) a degenerate moment se- 
quence. 

We can now characterize the two extremizing distributions as follows: 

THEOREM 7. Jf (u1, u2,°**, we) 78 non-degenerate and the two extremizing 
cumulative distribution functions are F(x) and F?(x) respectively, 

a) of k = 2q + 1, q a non-negative integer, 
then, 


FT (2x) has saltuses at k - , 


_ 


points in (0, b) 








F(x) has saltuses at 


b) if k = 2g, then, 


points in (0, b) and at both 0 and b. 


FY (2x) has saltuses at points in (0, b) and at 0 


- 


F(x) has saltuses at ; points in (0, b) and at b. 


Proor: The proof is immediate from Theorems 5 and 6 and by noting that 


these are the only ways in which cumulative distribution functions of degree 
(k + 1)/2 can be obtained. 





DETERMINING BOUNDS ON INTEGRALS 533 


We note that whether Ff(x) will be a maximizing or minimizing distribution 
depends on the particular choice of g(x) ¢ gi”. 

We now extend the preceding result to cumulative distribution functions in 
Fetes we) 


THEoreM 8. Jf (ui, wa, °-* , ux) 18 non-degenerate, and g(x) € 3? then 


sup (inf) f g(x) dF(x) 


FeF(#). 


is obtained by one of the following: 

(39) [ oz) arta), F(z) egerr 
where FT (x) has saltuses at (k + 1)/2 points in (0, ©) fork = 2q¢ + 1, qan 
integer =0, and Ff (x) has saltuses at k/2 points in (0, ~ ) and at 0, for k = 2q; 


or, 


b 
(40) lim [ g(x) dFi,(2), FR(2) ¢ genre 
-o 40 


where F(x) has saliuses at k — 1/2 points in (0, b) and at both 0 and b if k = 
2q + 1, and F%,(x) has saltuses at k/2 points in (0, b) and at b for k = 2q. 

Proor: For any cumulative distribution function F(x) es“'"*:"°"” and 
any b; > 0 we have f¢,xdF(x) S m and}; f¢,dF(z) s J¥,x dF(x). Hence, 
St, dF (x) S u/b:. Therefore by choosing }; sufficiently large, we have for any 
e> 0, fe, dF(z) <«. 

Further, Wald [8] has shown that if ui, uw, --- , we is a legitimate moment 
sequence, we can find a discrete cumulative distribution function with no more 
than k + 1 saltuses in (0, ~) which has these moments. We suppose the last 
saltus is at bp. Choose b > max (bi, be). Then since g(x) is bounded, i.e 
| g(x) | < M, wehave f¢ g(x)dF(x) < Me forall cumulative distribution func- 
tions ing¢“'-“*.""’“”, Hence we can employ Theorem 7 and the conclusion follows. 

THEOREM 9. limy.. F(x) = PF (x), where PP (x) is the extremizing cumulative 
distribution function F}(x) computed for k — 1 moment constraints. 

Proor: From Theorem 7, for every b, F%(x) is uniquely determined. 

Further, as b > «, F2(b) — F2(b — 0) = 0(1/b*), since otherwise u.— ~. 

Thus lim,.« f> x “aP* (zt) = ws, t= 1, 2,-++, K— 1. Let 

(F(x), z<b, 
F(z) = | 
Fx(b ), z 2 b. 


Then, as b> ~, F3,(xz) converges to Ff(x), computed for k — 1 moment 
constraints. This is readily seen, as follows: 

F3.(2) converges at all continuity points to a limit function P3(x), which is a 
cumulative distribution function, r’(F3(x)) = k/2, and satisfies k — 1 moment 
constraints, hence must be F(z) computed for k — 1 moment constraints. 





534 BERNARD HARRIS 


Corotuary. fo g(x) dFi(x) = lims.e fo g(x) dF2(x) where FR is com- 
puted for one more moment constraint than FT (x). 

We now obtain the following theorem due to Wald [8], as a consequence, of 
the preceding results. 

THEOREM 10. Jf (41, we, -** , ue) ts @ non-degenerate moment sequence, then 


for (u1, u2,-°** 4 Me41) to be a legitimate moment sequence, it is necessary and 
sufficient that 


2 
k+1 ype * 
Mey 2 I a GFy(x) = wen. 


If equality holds, then (u,, u2,-+** , ue41) t8 a degenerate moment sequence. 
Proor: Let pin = fo a*" dFT (x), Fi(r) es" "”. Since FI(z) is a 
unique extremal solution, it determines 2r quantities, z;, 4; ; 7 = 1, 2,---,7r 


such that Aiwwi + Aw +--+: + At = w(t = 1 2,-:-, &).and OS a < 
ams: 2S < OLA cS p ry V = 1. From Theorem 9, 


F(x) ec Fitter whew 
satisfies 
an rn in , ‘ P 
Audi + Aowe tees HA + Ard = u(t = 1,2,---,k +1) 
where rj —,;, x; — 2z;, and C. = 0(1/b") as b> @. 

Since F(x) provides an extremum of u,.41 , and we have shown the construc- 
tion of F%(2x), and since rue > 0 we can conclude yxy; = wen, with 
equality if (u.1, u2,°-*, es) is degenerate. 

On the other hand, by choosing b sufficiently large, we can produce a dis- 
tribution F3,(x) for any choice of wey: 2 wen. 


Many of the above results may be extended to other unbounded functions by 
similar limiting arguments. 


5. Computation of extremizing cumulative distribution functions on [0, ~ ) 
fork = 0,1, 2, 3. To find the extremizing cumulative distribution functions, 
we have to solve the following system of equations: 


Nit, + Ate + °° + AT = Ma 
Ati + Aes +--+ + Ae = me 


(41) «te eect c cece nccenececnns 
k k k 
Arti + Ata + e+ + Ae = Me 
%~20 Du=1 OSnSus--- 52% < @. 
t=1 
Assuming throughout that (4, we, --* , we) is a non-degenerate moment se- 


quence, we can then apply Theorem 8. 
When k = 0, we have trivially 


Fy(z) = li, ‘ 


IIA 
8 





DETERMINING BOUNDS ON INTEGRALS 


F(x) = 
For k = 1, 


(43) Fi(z) = 


To determine F2(x), applying Theorem 8, (41) becomes A2b = ux. 
Hence 


0, 
adi 
(44) Fx(z) = 
1, 
When k = 2, to determine FY (2), we consider Aer, = m1, and der? 
Thus 
noi 


A = - ; 
Me 


Fi(x) = 


For F2,(x), we solve (41), which becomes: 


Ait + Agd = wr, = ni + Agd” = mp 
obtaining 


= - Cu x b)" = 


2 
M2 —~ Mi 
(u1 — b)? + (ue — wi)’ 


(mu — b)* + (ue — ui)’ 


_ mb — m 
b— mw 


v1 


Hence, 


Fx (2) = 4 ih 


(yu. — b)> + (we — wi)’ 
1, 





536 BERNARD HARRIS 


Note that as b > ”, 4, 1, > 0, 21 > uw, eb > O, and Ab’ > we — ut, 
and F(z) is F? (x) for k = 1 with the addition of the infinitesimal mass at ~. 
When k = 3, for F} (x) consider 


(1) Arts + Aare 


= 1 
(2) Ati + Devs = be 
(3) Ai + etd = ms. 
From (1) and (2) we have 
eo Ss ewer: ene 





. 7 > = ee > r a , 
(wu, — 22)” + (ue — wi) ”  Gun~ mn) + Ge 
a (1 — Ai)22 
ZT —_ i a iy A . 
Substituting these in (3), we get 
funn — a2) + (ue — wid} (uz — mi)er 
(m1 a we) { (um ~~ ae)” + (us ani ui)} (ur ra a2)" + (ue oe Mi) 
Setting wi — t2 = y and pw. — ui = o, we have 
(uy + 07)* | ou — y)* _ 
y(y? + o*) y? + oa 





= ps. 





Thus 
24 3 ‘ 2 3 2 2 3 6 
—oy + (u + 3u0 — ws)y +o (3u0 + yu — ws) +o = 0. 
Setting Its = ys; — 3uo’ — yp’, we have 
(47) f(y) = oy + usy® + ousy — o° = 0. 


From (41), it follows that y < 0. Thus we are interested in negative real 
zeros of (47). From Descartes’ Rule of Signs, and the observations that f(0) < 0 
and f(—«) > 0, there is exactly one negative real root. 

We proceed to find the root. First, observe that 


fly) = oy’? + 0°) (y' + =y _ “). 


Hence, the unique negative root yo is given by 
’ 2 6\1/2 
=a — (un + 40°)" 


Yo = — 
Yo Qo? 
which provides the complete solution: 
2 2 
Yo g 
y= Bini signi 
ys + 0?” SO 
_ Myo + a 
Ee t= -— Yo, 


Yo 








DETERMINING BOUNDS ON INTEGRALS 537 


BYo + o 


’ 
Yo 


Z< 


(48) Fy(z) =4 yo wyo + 0° Sxr<yu-y 
ys + 0?’ Yo at Wak 


1, 


w— Yo S 2, 
To obtain F2(2x), the system of equations 


Aste + Asb - 
hes + Agd’ 


dees + Asb* = 
is easily solved, yielding 


re (ub — ue)® 


2 
oe i X Miks — Me 
(u2b — ps)(uib? — Qy2b + ps)’ ; 


ay b(u1b* — 2ueb + ws)’ 


Me m1 = de mde, Pa sree — 3 
ib — pe 


0, z<0, 
1 — Ae — As, 0 Py 4 — tent 
nb — pe 


tiny, Hab — ms 


S2z<b, 
ib — pe 


1, bsz. 


As b > ©, it is easily observed that F2,(x) — Fi(zx) for k = 2, with the 
addition of an infinitesimal mass at «. 


6. Application of Extrema Computations to ¢(r) and y (zx). First note that 


g(x) satisfies the hypotheses of Theorem 4 for all k, in fact g(x) is completely 
monotonic. We can see this as follows: 


¢” (2) = = @) go? (1) “(a ane ns (1 ee oom 


1=0 


= 2 °)(-1)"l(l — eo) + > o(— 1)" fea) 


= ile Degr it 


v! 


i=0 t! 
Since the right hand side is always negative, we have 


gy (z) < (>) 0, 


ePeenmaynarne Py ae — 1) _ eo 


v odd (even) 
for all z e [0, ~). 





538 BERNARD HARRIS 


Since (30) may be written 
E(C(a)) = 1 -/ e-* gF(2), 
0 


we observe readily that ¢ ‘“~"* also satisfies the hypotheses of the theorem, 
and in fact is also completely monotonic. 

When k = 0, F?(z) provides the maximum of E{g(zx)} and F?(x) provides 
the infimum. From Theorem 9, we conclude that the situation is reversed for 
k = 1, ie., Fi (x) is a minimizing distribution and F?(z) provides the supre- 
mum. In general, we note this alternation every time an additional moment 
constraint is added. Further, the extremum which is attained for any value of k 
is not improved by the addition of the k + 1st moment constraint. 

When k = 0, Fi(z) provides the minimum of E{y(xr)}, and F?(x) provides 
the supremum. Thus the solutions for g(x) and ¥(x) are identical for every k 
upon interchanging supremum and infimum. 

It is also interesting to observe the behavior of the upper and lower predic- 
tors for d(a) and C(a) as a— ~. From the alternation property noted above 
and from Theorem 8, since the supremum of E{¢(x)} has a mass point at zero 
for all k, the upper predictor for d(a) — ~ as a— . Similarly, we can ob- 
serve, that as a — ©, the lower predictor tends to a limit. This suggests the 
following interpretation: the upper predictor tends to infinity since, regardless 
of the sample size, there is no way for the experimenter to establish the non- 
existence of an arbitrarily large number of classes each with negligible proba- 
bility; the lower predictor tends to a limit, since there is no way for the experi- 
menter to conclude that he will not observe all classes eventually. 

Similarly, one can readily see, that the upper predictor for C(a) — 1 as a — 
o, and the lower predictor tends to a limit £, 0 < — S 1. This has a similar 
interpretation to the corresponding result for d(a). 


7. Historical remarks. Problems of this type have previously been investi- 
gated in papers by Corbet, Fisher, and Williams [2], Goodman [5], and Good 
and Toulmin [4]. 

The Corbet, Fisher, and Williams paper employs a parametric hypothesis 
as follows: 

It is assumed, that for any class, the number of representatives in the sample 
has a Poisson distribution with mean m, where the values of m are distributed 
according to a I'-type distribution. Then the expected number of classes ob- 
served is given by 


(50) E(d) = —d log (1 — y) 

and 

(51) BN) = 
5s % 


where A is independent of N, the sample size, and y/(1 — y) is proportional 
to N. 





DETERMINING BOUNDS ON INTEGRALS 539 


Then, if the sample size is augmented to aN, or a second independent sample 
of size aN is observed, we have 


(52) aN ~ 2 
.—¥¢ 
and 
E{d(a)} ~ =) tog { a \ 
aN + 
where A is determined from (50) and (51). Since 


2 aN +X 
(53) E{d(a)} ~ d+ 10g {9Y 1, 
we conclude, that for large N, the Corbet-Fisher-Williams hypothesis implies 
that the number of new classes to be observed when the sample is augmented by 
(a — 1)N is approximately A log a. 

Goodman considered the following problem: 

A population with a known finite number of elements is partitioned into an 
unknown number of disjoint classes. The classes are assumed to possess no 
natural ordering. A random sample is drawn without replacement and we wish 
to estimate the number of classes in the population. 

It is easily seen that this problem is identical to the problem of predicting 
the number of classes that will be observed in the case of an augmented sample. 

Goodman has shown that in general, an unbiased estimator of E{d(a)} will 
not exist. However, if the maximum frequency of any class in the enlarged sam- 
ple is known to be less than N, then the following estimator is unbiased. 


N 
(54) de,(a) = 2, Am 
where 
Ai = a, 
eee (aN) _ Ait(aN — nu) _ Asi(aN — ny) 
i N® (N — 1) 2N — 2-9 
 — Acai Y (aN — N) 
@-—1)WN -—t+1)’ 
Since d¢,(a) may give unreasonable answers, de,(a) was proposed by Good- 





fori > 1. 


d, do,(a) < d, 
do,(a) = }de,(a), d S de,(a) S aN, 


aN, aN < do,(a). 
Goodman also proposed 
alaN — 1) 


d, aN — Voi 


m% < d, 
(56) da, (a) = 


aN — vt Ne otherwise. 





540 BERNARD HARRIS 


Good and Toulmin have obtained the following predictors. 


(57) d(a) =d— 2X (-1)"(a — 1)‘n; 

and 

(58) Cla) =1- 2 > (-—1)* infla — 1)**. 
N= 


We present here, an alternative derivation of (57) and (58), which will ex- 
hibit their development from the moments (31). 

Since g(x) is completely monotonic in [0, © ), it is a Laplace transform with 
respect to a monotonic non-decreasing function of bounded variation in [0, ~ ), 
i.e., 


o(z) -[ &* aG(t) 


where G(t) is non-decreasing and G(#) < ». Hence 


[ o(x) dF(z) = [fl e * dG(t) dF(x) 


where F(z) is a cumulative distribution function. The inverse Laplace trans- 
form is easily seen to be 
(t OstSa-1 
G(t) = 
le — 1, a-l<t. 
Hence 


FE g(x) dF(x) = {fC e* dt dF(x). 


Interchanging the order of integration, we have 
o a—l 
[ oz) ara) = [Mae 
0 0 


where M_,(t) is the moment generating function of (—X). Since 
RX’ ~ (r+ 1)! Mert 


ny 
we have 





M_,(t) = (LU + Di 


r=0 ni 


Upon integrating M_,(t) term-by-term, we get 


o —(a—l)z 2 
m | -—£__ dF(z) = > (-1)'’ mala — 1)" 
, } 


r=( 


from which we obtain (57). 





DETERMINING BOUNDS ON INTEGRALS 


Similarly, we can obtain (58), by noting that 


[ (1 — ¥(x)] dF(x) = [fl e-™ dH(t) dF(z) 


where 


0, t<a-l, 
H(t) = 


A t2a-l1, 


so that, by interchanging the order of integration, and introducing the moments, 
we have 


[ (1 — v(x)] dF(2) = > (=D' + Drala = 1)" 


nm 


Then, from (33) we get 
B{C(a)} ~1— FD (-W Cr + Wmeyale = 1) 


Whenever a 2 2, the Good-Toulmin formulas depend heavily on the higher 
moments, which the experimenter knows with less precision. However, in the 
Good-Toulmin paper, this difficulty is largely circumvented by transforming 
the series, so that a sum can be obtained from any of several methods for sum- 
mation of divergent series. 


8. Numerical examples. Three examples have been chosen to illustrate the 
methodology of this paper. The first is artificially constructed. The second 
appears in Good and Toulmin [4], and the third appears in both Good and 
Toulmin and Corbet, Fisher, and Williams [2]. 

Example (i). One hundred observations were taken from a multinomial popu- 
lation with 100 equiprobable cells. The data are summarized in Table 1. 


TABLE 1 





The upper and lower predictors for the number of classes, employing the first 
k moments, d.(a) and dy (a) are given in Table 2. 








542 BERNARD HARRIS 


TABLE 2 








| di(a) d,(a) a, (a) 

2 | 108 67 | 93.7 94.4 

3 149 67 104.3 107.9 

4 190 | 67 108.5 116.8 

5 231 67 | 110.2 124.2 

10 | 436 67 | 111.2 | 157.7 
oo oe 


“ 67 111.2 





In this case we are unable to proceed to ds(a), since m, does not satisfy the 
conditions of Theorem 10. 

We also note that d,(a) exceeds 100 for a > 2. Since E(d) = 63.4 and the 
observed value of d was 67, it is clear that our predictors should give answers 
which are a little high in this case. 


The predictors of the coverage, C,(a) and C,(a) are given in Table 3. 











TABLE 3 
oe 7 ‘ | ae 
a Co(a) Cola) C1(a) €2(a) 
2 | 1.00 .59 .838 | .820 
3 1.00 59 -936 .896 
4 1.00 .59 -975 921 
5 1.00 59 -990 -930 
10 1.00 .59 1.000 -934 
0 | 1.00 .59 1.000 | -934 
C(1) = .59 


Example (ii). This example is due to Good and Toulmin [4]. 1000 words from 
“Our Mutual Friend” by Charles Dickens were tabulated and the results are 
given in Table 4. It is to be noted that the method of sampling, i.e., choosing 
the last words of lines on pages congruent to 5 modulo 25, is not random sam- 
pling; however, the data will nevertheless suffice to illustrate the method. 


TABLE 4 
r Nr | r Nr 
1 404 6 3 m, = .2822, 
2 57 7 | 0 me = .3564, 
5 24 | 8 3 m; = .9505. 
{ oe .. 9 ee 15 
5 6 | sgl innings = 





DETERMINING BOUNDS ON INTEGRALS 543 


We obtain as predictors for the number of classes the results shown in Table 5. 


TABLE 5 


do(a) 





528 
528 

28 | 1438 
528 1648 
528 2286 
528 2737 


Good and Toulmin have computed d(5) using their predictor and get 1683. 
The upper and lower predictors employing three moments are 1854 and 1643. 
The predictors of the coverage are given in Table 6. 


TABLE 6 


€y(a) eg(a) 


.000 596 
.000 596 
.000 .596 
.000 596 
.000 596 
000 596 





C(1) = .596. 


Example (iii). This example is due to Corbet, Fisher, and Williams [2]. 15,609 
Macrolepidoptera were caught in a light trap at Rothamsted and classified by 
species. The data are summarized in Table 7. 


TABLE 7 


ny 








544 BERNARD HARRIS 


The upper and lower predictors for the number of classes employing the first k 
moments, @(a) and @(a) are given in Table 8. 











TABLE 8 

a | 4. (a) de(a) di(a) | dn (a) 
2 | 275 | 240 266.0 270.9 
3 310 240 279.8 300.6 
4 345 240 287 .2 330.2 
5 380 240 291.2 359.8 

10 | 555 240 295.5 507.9 
s | « | 240 295.7 wo 


be ne ak ; - a | | 





Note that here, as in Example (i), we are unable to proceed to d;(a), since ms 
does not satisfy the condition of Theorem 10 and hence only m; and m are realiz- 
able as moments of a cumulative distribution function on [0, ). 

The corresponding predictors of coverage are given in Table 9. 

















TABLE 9 
e Cola) | Cola) | Ci(@) | x(a) 
th Sees em eS ee = 

2 1.0000 | .9978 | .9988 | .9981 

3 1.0000 .9978 .9994 .9981 

4 1.0000 .9978 .9997 | .9981 

5 1.0000 .9978 | .9998 .9981 

10 1.0000 .9978 1.0000 | .9981 

ew 1.0000 | .9978 | 1.0000 .9981 
C(1) = .9978. 


Employing the parametric hypotheses of Corbet, Fisher, and Williams, we 
have Ed(2) ~ 267.9. 


Good and Toulmin obtain d(2) = 261.9. 
Williams (in Corbet et al) noted that doubling the sample size would approxi- 


mately halve the proportion of the population not represented in the sample. 
In Good and Toulmin, €(2) is given as .9991. 


APPENDIX A 


In Section 2, the following approximations are employed 


oO 


(1) Su-a-p)epo- 
(2) > (™) oa - pa hwy 


(3) 1— 2 pill — py)” m1 — Dope”. 
j= j= 





DETERMINING BOUNDS ON INTEGRALS 545 


We will show that the approximations are satisfactory for large N. We first 
establish (1). 
Consider 


x x 


ID - (i - p)")- Da - 


} j=l j=l 


| x 


> (i — e*"4] | 


j=l 


where 0 < p; < land >>; p; = 1. 
It is easy to see that if a; ,b; > 0,7 = 1, 2, --- ; and ao/bp = sup, a,/b; , then 


dai 
i, * os 
Thus 
Sa -a-p)- Ener 
Suey 


j=l 


(1 - p)* a eN loa(tl—p) 


- y Np 
ee ae 


we have 
—Np . Np 
w= U9)". om (1 - ep [-a=) 


7 cos e-NP 1 — e-%p 


= , Np 
a (1 - ew [-sq~) 


1 — @e-%P 
If p = 1/-V/N, then 


eg 7? — (1 oe Pp) N e -Np 
—T-e" *{-en5 


which clearly tends to zero as N ©. For p < 1//N, 


—Np N —Np —a(N p?/2) 
om? (1 =p)" "1 — 6 \. 
i <a = | — erp = hy(p) 


where a = N/(+/N — 1)’. 





546 BERNARD HARRIS 


Differentiating hy(p) and equating to zero, we have 


(1 — gy Ne) +. (Nope *?- “4! ) (4 —¢?) fs 


nn. ted eiemegage aoemeummames (1 — e7¥?)2 0 


whence, we have 
—(1 — 8?) + ape") (1 — e 8?) = 0. 
Thus 
—1 +e + apll — &**]) = 0 
or, since p < 1/\/N 
log (1 + ap[l — &*”]) ~ ap[l — e*? 
we have 


r 2 : 
= a + ap[l —e**] ~0 and l—e"~ Np 


2 2 


Thus, we can establish that the maximum of hy(p) occurs when p ~ 1.6/N. 


However, 
1.6 e . / —(2.56a/2N) 
i( 42) om l-e ’ (l—e ) 


which clearly tends to zero as N — ~. If p; = 1 for some j, then all other terms 
in both sums in (1) = 0, and the approximation holds trivially. If p; = 0, 
for some j, neither sum is increased, and hence we have established (1). 

In considering (2), we suppose r’ to be very small compared to N. We shall 
show that the approximation is satisfactory in the following sense; either both 
sides of (2) are negligible for sufficiently large N ; or the ratio of the error to the 
expected value of the number of cells with r elements is small. 

Thus, since 





and 

(1 — p)*” ~ exp | -v -—r)p- aloes | ; for p < 1, 
we have 
y Moy = (%) ofc - pat ~ Ae oon 








; r(r — 1) (N—r) ¢ } 
bk a hy = ge ee oe. 


\ 





DETERMINING BOUNDS ON INTEGRALS 


First consider those indices j for which p; = 1/N**. Then 


pjzarn2/s) or! 2N 2 


< (Np;)' enti. 


pjzcin2isy or’! 


For N sufficiently large, 
(Np;)" &*?4 etre we yale 


? 


> (1) 2/3) r! r! 


which is negligible for sufficiently large N. The approximation holds trivially 
for the case p; = 1 for some j, and 0 for all other indices j. 
Now consider, 


Ope ep lop MD ag... 


pj< (1)N2/3) 


~ ney 2 3 
(Npj)’ €"”! 


pj<(1/N?2/3) r! 


(Np)’ eX? f | rir—1) (N-—r) 2 } 
1 — exp| rp —- ——— - ——"p - ::: 
aiid A aoe ie ae oie 
~ p< (jne/s) (Np)’ &*? 


r! 


rir—1) (N-—-r) 2 


sup 1 — exp E - ——— —- —~—_ pp - 


p<(1/N2/3) 


2N 2 


0(1/N 1/3) 
l—e ; 


The discussion of (3) is almost identical with that of (2) and will not be pro- 
duced here. 


APPENDIX B 


_ Asymptotic mean square error of C(1). In Section 3, we established 
C(1) =1— (n/N). It is clear that ElC — C(1))/ = E(D — (m/N))’ 
where D = 1 — C. From (14) and (15) in Section 2, we have 
ED’ = >) ppl — pi — pi)” + dD pil(d — ps)” — (1 — 2p;)"). 
tJ I 


From (11) in Section 2, we have 


» ni N! _ \Na2 
r= yi En + mow 91? pil Di Pi) 


le< N! N-2 
-m4w ae i el — 2p;) ° 





548 BERNARD HARRIS 


Then, to find E(Dn,), we introduce two random variables 


, (1, if jth cell does not occur in sample, 
ait lo, otherwise, 
; (1, if jth cell occurs once in sample, 
oe lo, otherwise. 


Then D = i pjX;,m = >, Y;. Thus 


i 


E(Dm) = > E(p:X.Y;) 


> Nps — pi — ps)" 


4) 


=  Nppi(1 — pi — ps) — DO Np — 2p;)*". 
t.J j 


Hence, upon introducing the exponential approximations, we have 


» ny r 2E ne En, 
4 I- = Fed mmm “ ° 
I ( *) N?2 + N? 


REFERENCES 

{1] H. Cuernorr ANp 8. Rerrer, “Selection of a distribution function to minimize an ex- 
pectation subject to side conditions,’’ Stanford University Technical Report No. 
23, 1954. 

{2} A.S. Corset, R. A. FisHer, anv C. B. Wiviiams, ‘“‘The relation between the number of 
species and the number of individuals in a random sample of an animal popula- 
tion,’’ J. Anim. Ecol., Vol. 12 (1943), pp. 42-58. 

(3) I. J. Goon, ‘‘The population frequencies of species and the estimation of population 
parameters,’’ Biometrika, Vol. 40 (1953), pp. 237-264. 

(4) I. J. Goop ano G. H. Tourn, ‘‘The number of new species and the increase in popu- 
lation coverage when a sample is increased,’’ Biometrika, Vol. 43 (1956), pp. 
45-63. 

[5] L. A. Goopman, “On the estimation of the number of classes in a population,’’ Ann. 
Math. Stat., Vol. 20 (1949), pp. 572-579. 

[6] S. Karun ano L.S. Suapwey, “The geometry of moment spaces,’’ Memoirs of the Amer. 
Math. Soc. No. 12, 1953 

[7] J. 8. Rusraar, “On minimizing and maximizing a certain integral with statistical appli- 
cations,’’ Ann. Math. Stat., Vol. 28 (1957), pp. 308-328. 

[8] A. Wap, “‘Limits of a distribution function determined by absolute moments and in- 
equalities satisfied by absolute moments,’’ Trans. Amer. Math. Soc. Vol. 46 

(1939), pp. 280-306. 




























ON A GENERAL CONCEPT OF “IN PROBABILITY”! 


By Joun W. Pratr 


Harvard University 


1. Summary. Chernoff [1] has called attention to a paper of Mann and Wald 
[5], which provides a general theory and a convenient notation for the derivation 
of theorems concerning stochastic limits and limit distributions. The present 
paper attempts to clarify the first of these topics, stochastic limits, by applying 
one form of the definition of convergence in probability to any event rather than 
just to convergence. As in the convergence case, most of the reasoning one is 
intuitively disposed to do in this connection is valid. Its justification is made 
more transparent but no more difficult by broadening the applicability of the 
definition. Thus broadened, “in probability” neither implies nor follows from 
‘‘with probability one.” 


2. Introduction. This section leads up to a general concept of “in probability.’’ 

Suppose {z,}, {r.} are sequences of points of the extended real line [— ©, «]. 
It is customary to write z, = o(r,) if z,/r, ~ 0 asn— @, and z, = O(r,) 
if z,/r, is bounded for large n. (Saying “for large n” allows a finite number of 
z,/T, to be infinite or undefined.) Writing out the definitions fully, we have: 

<n, = 0(r,) if, for every positive », for some N, for every n > N, |z,/r,| S 7; 

zr, = O(r,) if, for some » and N, for every n > N, |z,/r,| S 7. 


(“For some”’ and “there exist(s) ... such that’ are equivalent. ) 

Suppose now that {X,} is a sequence of random variables on [— ~, ~]. It is 
customary to define 0, and O, by adding probability requirements to the defi- 
nitions of o and O as follows. 

DEFINITION 1. X, = 0,(r,) if, for every positive ¢ and n, for some N, for every 
n> N, P,{|X,/r.| S n} 21 — 

X, = O,(r,) if, for every positive «, for some 7 and N, for every n > N, 
P,{|X,/r.| Sn} 21 —e. 

X, = o,(1) is also written: X, => 0. Note that only the distributions of the 
individual X, are referred to. This is emphasized by the use of P, to denote 
probabilities. The presence of any joint distribution is irrelevant, and indeed 
at least one common use of the definition is in connection with asymptotic 
distributions, where n is related to the sample size and there is not naturally a 
joint distribution at all. 

If we fix 7 first in the above definition of o, , we find immediately that it is 
equivalent to another common definition: 


Received September 3, 1957; revised December 28, 1958. 

1 Research carried out at the Statistical Research Center, University of Chicago, and 
at Harvard University, under sponsorship of the Statistics Branch, Office of Naval Re- 
search. Reproduction in whole or in part is permitted for any purpose of the United States 
Government. 


549 





550 JOHN W. PRATT 


Derinition 1’. X, = 0,(r,) if, for every positive 7, P»{|X,/ra| S n} — 1 as 
a— @, 

We cannot “‘fix » first” in the definition of O, . 

If we fix ¢ first, we are led, less directly, to less familiar variations: 


Derinition 1”. X, = 0,(r,) if, for every positive «, there exist c, with 
P,{|Xn| S ca} 2 1 — € such that c, = o(r,). 
Derinition 1’. X, = 0,(r,) if, for every positive e, there exist S, with 


P,{X, € S,} 2 1 — e such that x, ¢ S, for all n implies z, = o(r,). 

Definitions 1” and 1’” remain equivalent to Definition 1 if O is substituted for 
o. For both o and O, this equivalence can be proved directly by letting Si be 
[—c}, , cs] and c,, be the smallest number such that P,{|X,| < ch} = 1 — « ie., 
the upper-tail «-probability point of |X,|. In Section 5, the equivalence of Defi- 
nition 1’” to Definition 1 will be proved for o as Corollary 2 and for O as Corollary 
3 of Theorem 6. As far as I know, Definitions 1” and 1’” were first stated ex- 
plicitly by Chernoff. It is proved in [5] that the condition of 1’ implies the con- 
dition of 1. 

Definitions 1” and 1’” are easier to work with than the original definition 
for many purposes because of the way they separate the stochastic and limiting 
aspects of the situation. They also seem to me to have at least as much intuitive 
meaning and reasonableness. Definition 1’’’ suggests immediately the generali- 
zation introduced in the next section. 


3. Definition and fundamental properties of a general concept of “in proba- 
bility.” Suppose, form = 1,2, --- , P, is the distribution of the random variable 
X, in the set X, , that is P,(X, ¢ S.) = P,(S,) is a probability measure on the 
measurable subsets S, of X, . If S, is a measurable subset of X,, the event 
X,, € S, will be called an “‘X,-event” E, . If S is any subset of the product space 
X = «%., X,., the event (X,, X:, ---) e¢ S will be called an “(X,, X2, ---)- 
event” E. (S need not be measurable, and indeed measurability need not be 
defined for subsets of X.) 

Derinition 2. The (X;, X2, ---)-event E will be said to “occur in proba- 
bility,” written @(), if, for every positive «, there exist X,-events E, of proba- 
bility at least 1 — ¢« such that Z occurs whenever all EF, occur. 

A more formal version of this, in terms of sets, is 

Derinition 2’. @(S) if, for every positive «, there is a sequence {S,} such 
that (1) S, is a measurable subset of X,, (2) Pa(S,) 2 1 — «¢, and (3) 
x, S.C S. 

The concept defined here is, as one would hope, independent of the choice of 
underlying random variables from among those possible. More precisely, suppose 
Y,, Ya, --++ have given distributions. Then ®{(fi(Y1), fe( Ye), --- ) e T} has 
the same meaning whether X, = Y, or X, = f,(Y,) in Definition 2, that is 
whether the event (fi( Y1), fe( Y2), ---) ¢ T is regarded asa (Y,, Y2, ---)-event 
or as an (f;(Y1), fe( Y2), ---)-event. This amounts to the trivial fact that there 
are sets S, with P,(Y, ¢ S,) 2 1 — e€ such that (fi(m), fe(ye), ---) ¢ T when- 





GENERAL CONCEPT OF “IN PROBABILITY”’ 551 


ever y, € S, for all nif andonlyif there aresets S, with Palf,(¥n) € Sa} 21 —e 
such that (fi(y:), fe(ye), --+) € T whenever fn(yn) € Se for all n. 

We now investigate the behavior of @ in connection with some elementary 
relations of events such as E = F (which means the event F occurs whenever 
the event E occurs.) The theorems are also stated in terms of sets. Thus, for 
instance, E = F is equivalent to S C T if S and T are sets corresponding to 
events £ and F. Proofs given in terms of events can, of course, be directly trans- 
lated into (perhaps more formal) proofs in terms of sets. 

THEOREM 1. If E => F, then O(E) => O(F). 

THeoreM 1’. Jf S Cc T, then @(S) > O(T). 

This is an immediate consequence of the definition of @. 

TxHEorEM 2. @(for all a, E*) <= for all a, @(E"), provided the range of a is 
countable. 

THEOREM 2’. (NM. S*) = for all a, @(S*), provided the range of a is countable. 

Proor. = is an immediate consequence of Theorem 1, whatever the range 
of a. 

To prove =, suppose a = 1, 2,--- , and @(E*) for all a, and « > 0. For 
each a, by Definition 2, there are X,-events E> of probability at least 1 — 2°“e 
such that E% for all n implies E*. Let E, <= forall a, Z5 . Then EZ, for all n im- 
plies for all a, E*. Furthermore, 


P,{E,} 21- > Pa {not Es} 8 ot Mt rend cor an 


a=l 


¢ was arbitrary. Therefore, by Definition 2, ®(for all a, E*). 

THEOREM 3. & (for some a, E“) if, for some a, @(E“). 

Tuerorem 3’. @(U, S*) tf, for some a, ®(S*). 

Here the range of a is arbitrary. The proof is trivial. (Technically, Theorem 1 
is involved. ) 

The converse of Theorem 3 is false. If it were true, since we have @(E or not £) 
for every E, we would have either @(Z) or @(not £) for every E, which is 
clearly absurd. For instance, let P;(X, = 0) = $ and let E be the event X, = 0. 

P(not EZ) = not @(£), for otherwise @(E and not E) by Theorem 2. The 
converse, not @(£) = @(not E), is false, for otherwise we would again have 
either @(E£) or ®(not #) for every E. 

The following theorem covers Theorems 1-3 and summarizes what can be 
obtained from them. 

TuroreM 4. If ¢(E*) = F, then ¢(@(E*)) = @(F), for any logical combina- 
tion or formula $, provided only that @ involves at most a countable number of and’s 
and no not’s. More precisely, we suppose that @ consists of a finite number of phrases 
“for all” and “‘for some,” each with an index set, which must be countable in the 
case of “for all.” 

For instance, we might have ¢(£*) = “for some A ¢ @, for all a ¢ A, E*,” 
where each A ¢ @ is countable, though @ need not be. 

TueoremM 4’. If ¢'(S*) Cc T, then (@(S*)) = @(T), where ¢' is a finite 








552 JOHN W. PRATT 


series of the set operations of intersection (with countable index) and union (with 
arbitrary index), and @ is obtained from ¢' by replacing intersection by “for all’’ 
and union by “for some.” 

Proor. ¢(@(E“)) = @(¢(£*)) by successive application of Theorems 2 
and 3. @(¢(E£*)) = @(F) by Theorem 1. 

These theorems may seem like poorly disguised trivialities, as indeed they are. 
My main purpose has been to remove the disguise from some useful trivialities. 
For example, 

TueoreM 5. Suppose that 


f2(X,) = 0,(r2), j=-1,---,J 

g'n' (Xn) = Op(8'n'), hecho i, 
and that ha(tn) = O(t,) whenever 

$2(ta) = O(r2?), £ echeeordd 

g's (tn) = o(8%’), k=1,---,K 


Then it follows that h,(X,) = O,(t,). Furthermore, if O(t,) is replaced by o(t,)- 
in the hypothesis, the conclusion is ha(Xn) = 0p(tn). 

This looks formidable, as does Corollary 1 of [5], of which it is a paraphrase. 
However, Chernoff reports in [1] that he has found Theorem 5 very useful, and 
it seems to be about the least general theorem which covers the cases that occur 
in practice. Both Theorem 5 above and Corollary 1 of [5] are weaker in several 
respects than Theorem 4, and are covered directly by it and proved in the same 
way once the equivalence of Definitions 1 and 1’”’ is established. To realize that 
Definition 1 will be tractable when put in the form 1’” is the nontrivial part of 
the reasoning. 


4. Examples. It will be proved (Corollaries 2 and 3 of Theorem 6 below), and 
a direct proof has already been indicated in the next to last paragraph of Section 
2, that f,(X,) = O,(r,) if and only if @(f,(X,) = O(r,)) and the same for o, 
and o, that is, Definition 2 is actually a generalization of Definition 1. This fact 
permits the theorems of the last section to be used to carry out a certain common 
type of argument. Chernoff [1] gives some examples, and others follow here, as 
well as a case where the argument is tempting but cannot be used. 

4.1. If Yn — Ya 0, Zn — Za? 0, Yu = Op(1), Zn = Op(1), and f(y, z) is 
(jointly) continuous, then f(Y,, Z.) — f( Y. ‘ Zn) => 0. 

To prove this, apply Theorem 4 with X, = (Y,, Y.,Zn, Zn); E' the event 
Y, — Y, — 0; E’ the event Z, — Za > 0; E’ the event { Y.,} bounded; E* the 
event {Z,} bounded; F the event f(Y¥n, Zn) — f(Yn, Zn) — 0; and @(E", E’, 
E’, E*) the simultaneous event E’ and E’ and E* and E*. The fact is used that 
a continuous function is uniformly continuous on bounded sets. 

The treatment of special cases of this example in the literature indicates the 
value of the approach codified here. For instance, Cramér ([2], p. 255), attribut- 





GENERAL CONCEPT OF “IN PROBABILITY” 553 


ing the result to Slutsky [6], uses the relation between convergence in probability 
and convergence in distribution to prove that if Y, > y,---, Z, > z where 
y,°**, 2 are constants, and if f is a power of a rational function, then 
f((Y,,°::, Zn) ply, ++: , 2). (A finite number of arguments y, --- , z is no 
more difficult than two, of course.) Halmos ([3], p. 94, Problem 1) outlines an 
ingenious proof that ¥.Z, — YuZua > Oif Yn — Yn > 0, Zn — Zn = 0, Yn = Y, 
and Z, = Z. (An easy extension to the case Y, = O,(1) and Z, = O,(1) anda 
slight further argument would then prove the first sentence of this subsection 
(4.1) for f a power of a rational function. ) 

4.2. It is easy to see that y,/n — 0 = max(y;, ---, yn)/n > 0. A hasty 
application of Theorem 4 (or Theorem 1) would then lead to the conclusion that 
Y,/n > 0 => max(Yi,---, Yn)/n > 0. 

To see that this conclusion is false, let the Y, be independently exponentially 
distributed with means yu, , that is, fort > 0, P,{Y, > 4 = exp(—t/u,). Then 
Y,/n > Oif (and only if) u,/n — 0. Let w be arbitrary and yp, = n/log n, n > 1. 
Then Y,/n > 0. However, P,{max(Y,, --+ , Yen) S 2ne} S II:"P.( Y; S 2ne) 


< {1 — (2n)*" — 0 for « < 4, so max(¥,, --- , Y,)/n does not > 0. 

The difficulty is that max(Y,,---, Y.)/n > 0 is not equivalent to 
@(max(Y,,---, Yn)/n — 0) for any X, for which Y,/n > 0 is equivalent to 
e(Y,/n — 0) so that Theorem 4 cannot be applied. If X, = Y,, then 
max(Y,,---, Y,) is not a function of X, alone, so Corollary 2 of Theorem 6 
cannot be applied to show max(Y,, ---, Y,)/n > O equivalent to 
@(max(Y,,---, Ya)/n — 0). If X, = max(¥i,---, Y,), then Y,/n > 0 is 
similarly not equivalent to @(Y,/n — 0). If X, = (¥1,---, Ya), 
max(Y,,---, ¥,)/n — 0 is not an event on the whole of the product space of 
the ranges of the X, , but only on the subspace with points (yw; #1, ei Mh, 
Y2, Ys;°'* )- 

4.3. Suppose Y, — Y > 0, Z, — Z > 0, and f(y, z) is (jointly) continuous. 
Then f(Y,, Z,.) — f(Y, Z) > 9. 

As in 4.2, we cannot apply Theorem 4 directly. However, letting (Y, , a 
Baa Z..) have the distribution of (Y,, Y, Z,, Z) for each n, the hypothesis is 
equivalent to Y, — Y. 70,2. — Zn > 0 and the conclusion to f(Y,, Z,) — 
AT Z.) >’ 0. Thus the desired result follows from 4.1. 

Having to restate Y, — Y 7 Oas Y, — Fi >’ 0, etc., is no loss mathematically, 
since the statements are equivalent. Indeed this very equivalence gives some 
insight into the problem. It shows that we will have to make use of the uniform 
continuity of f on bounded intervals. Further, it emphasizes that when 
Y, — Y > 0, the fact that the value of Y is the same for each n contributes 
nothing essential. Of course, the fact that the distribution of Y is the same for 
each n may contribute; indeed, it does, since it gives Y, = 0,(1). 

4.4. Suppose Y, — Y = 0. Suppose also F, is a random function and, for 
every M, supjysu| F.(y) — f(y) | > 0. Then F,(Y,) — f(Y) > 0. 

The usefulness of results of this kind is that the limiting distributions of Y, 
and Y’, are the same if Y, — Y. > 0. Thus, for example, in 4.4 the problem of 





554 JOHN W. PRATT 


finding the limiting distribution of F,(Y,) has been reduced to finding that of 
f(Y) which is ordinarily much easier. 

4.5. Suppose X, is the proportion of successes in n independent binomial 
trials with probability p of success. X, — p = O,(1/+/n). If f is once differ- 
entiable at p, f(Xn) = f(p) + f'(0)(Xn — p) + 09(1//n), and 


V nlf (Xn) - S(p)] r f' (pe) Vn(X,, al 2, + 0,(1). 


Now the asymptotic distribution of +/n(X, — p) has variance p(1 — p), so 
the asymptotic distribution of +~/n[f(X,.) — f(p)] will have variance inde- 
pendent of p if and only if f’(p) is a multiple of 1/(p(1 — p) )*. This gives f(p) = 
arcsin +/p, up to affine transformation. Note that we have not proved anything, 
except heuristically, about the limit of the variance of ~/n|[f(X,) — f(p)], even 
though +/n(X, — p) has variance p(1 — p) for every n. This is because E(0,(1) ) 
is not necessarily o(1). On the other hand, what we really want, often, when we 
ask for asymptotically constant variance is that the variance of the asymptotic 
distribution shall be constant, since in this case an F statistic based on the trans- 
formed variates will have an F distribution asymptotically. This justification of 
the arcsin transformation has no bearing, of course, if the purpose of the trans- 
formation is to make some effects additive. 

4.6. If X,-events E,(e) are given with P,{E,(¢«)} 2 1 — «¢, then @ {for some «, 
for all n, E,(¢)}. This remark is trivial, but it is technically involved, along 
with Corollaries 2 and 3 below, in showing that Theorem 4 of this paper includes 
Theorem 1 and Corollary 1 of the Mann-Wald paper [5]. 


5. A theorem showing the equivalence of the definitions. It will be pre 
from Theorem 6 that X, > 0 is equivalent to ®(X, — 0) and X, = O,(1) to 
@(X, = O(1)). These facts are a little more easily proved directly. However, 
Theorem 6 and Corollary 1 may be of interest in themselves. 


THEOREM 6. Suppose that, for every n and a, E% is an X,-event and EX => E* 
if B > a. Then 


P,(E%) — 1 = uniformlyin n as a 2@ 
inf, P,(E%) 1 as am ee 
(for some a, for all n, E%). 


(We have in mind that the range of a is the positive integers, although the proof 
applies to more general partially ordered sets. ) 

Proor. The first <> is immediate. 

To prove the second =, suppose inf,P,(Z%) — 1 as a — o. Let « be 
an arbitrary positive number. There exists o such that, for all a 2 o and all n, 
P,(E%) > 1—« let E, = FE... Then P,(E,) = 1 — e«. Furthermore, E, for 
all n = for some a (namely c), for all n, E% . Therefore, by Definition 2, @ 
(for some a, for all n, E%). 

To prove the second —, suppose @(for some a, for all n, E%), and let ¢ be an 





GENERAL CONCEPT OF “IN PROBABILITY” 555 


arbitrary positive number. By Definition 2, there exist Z, with P,(E,) 2 1 — « 
such that EZ, for all n = for some a, for all n, E%. 

Fix m arbitrarily. E, for all n = for some a, E%,. Therefore, the other Z,, being 
irrelevant, E,, = for some a, E%, . Since EX => F*, for 8 > a, it follows the event 
(E,, but not EX) decreases to the null event as a — ©. Then there is an z,, 
where E,, occurs but E% occurs only for a so large that P,,.(E,, but not E%) < «.” 

Suppose z, has been so chosen for every m. At (2, 22, -+- ), E» occurs for 
allm, and, consequently, there exists o such that, for all m, E>, occurs. For a 2 o, 
E*, occurs at x», whence P,,(Z,, but not EX) < «. Therefore, for all m, for all 
a 2 6, Pa(En) = Pn(Em) — € 2 1 — 2e. But ¢€ was arbitrary and o didn’t 
depend on m. Therefore P,,( £3.) — 1 uniformly in m asa— ~. 

Coro.iary 1. P,(E,) ~ lasn— © = @( for some a, for alln 2 a, E,). 

Proor. Let E% = E, for n = a, let E% be the universal event X, ¢ X, other- 
wise, and apply Theorem 6. 

Coro.uary 2. f,(Xn) = 09(Tn) @ O(fa( Xn) = O(7n)). 

Proor. Let Yn = | fn(Xn)/ra|. fn( Xn) = Op(Ta2) —& for every positive 7, 
PAY, S n} ~ lasn— « & for every a, P,{ Yn S 1/a} ~lasn— ~ & for 
every a, O(lim sup Y, S 1/a) — O(for every a, lim sup Y, S 1/a) @ 
O(fn(Xn) = o(rn)). (The range of a is the positive integers.) The first = is 
Definition 2, the second is immediate, the third follows from Corollary 1, the 
fourth follows from Theorem 2, and the last is virtually definition. 

Coro.uary 3. fa(Xn) = On(ra) > O(fn(Xn) = O(rn)). 

Proor. Let Yn = | fn(Xn)/ra|. fn( Xn) = Op(r.) <= for every positive «, for 
some 7» and N, for every n > N, PLY. S 9} 21 —€e@ 


inf,,~P.{Y¥,<N}—-1 as No aoe 


@(for some N, for alln > N, Yn S N) & O(fn(Xn) = O(7,)). The first <= is 
Definition 1, the second is immediate, the third follows from Theorem 6, and the 
last is virtually definition. 


6. Relation to Probability One. One might ask what relation there is between 
@(E) and Pr(£) = 1. This question is not entirely natural, in that Pr(Z) = 1 
refers to the joint distribution of the X, , while @(/) refers only to their indi- 
vidual (marginal) distributions. The joint distribution may sometimes be 
changed so that Pr(Z) is changed without changing the marginals. For example, 
suppose X,, X2, --- have the standard normal distribution. If they are inde- 
pendent, Pr(X, diverges asn — ©) = 1. If X, = X, = --- , Pr(X, diverges 
asn— ©) = 0. In any case, ®(X, diverges as n — ~). To prove the latter, 
let E,, in Definition 2 be the event X, not between a, and a,» , where a, takes on, 
in rotation, the ke/2-points of the standard normal distribution. 


2 The range of a must be a directed system for ‘‘a — «’’ to have meaning. Further re- 
striction of the range of a, and the hypothesis that for every n, Ei => E%. if 8 > a, are re- 
quired only to establish this statement, and hence only to prove the second ©, which is the 
least trivial part of the theorem. 





556 JOHN W. PRATT 


As another example of what can happen, let X, be normal with mean 0 and 
variance ¢, and let X,, X2,--- be independent. Then 


X,7> 90 ifandonlyif «,—0. 
Pr(X, — 0) = 1 if, for every positive ¢, Db(«/e,) < ~, 
Pr(X,, diverges) = 1 if, for some positive ¢, D@(e/e,) = ~, 


where (1) is the tail area above ¢ of the standard normal distribution. Letting EF 
be the event X, diverges, we see that we may have, even for independent 
X,, X2,-°-:, Pr(E) = 1 yet P(not £). 

There are some events for which this is not the case. For example, 

Turoreo 7. If, for every a, Pr(E*) = 1 = @(E*), then Pr(for every a, E*) = 
1 = @(for every a, E*), provided the range of a is denumerable. 

This is an immediate consequence of Theorem 2. 

THEOREM 8. Suppose that, for every n, E% => E*, if 8 > a. Then Pr(for some «, 
for every n, ES) = 1 = @(for some a, for every n, E%). 

(The same range of a is possible here as in Theorem 6.) 

Proor. The hypotheses imply that Pr(for every n, Et) — l asa — ~. It 
follows that inf,P,(£%) — 1 as a — «©. Theorem 6 completes the proof. 

These two theorems may be applied successively to prove Pr(#) = 1 = @(E) 
for many events EZ. For instance, they cover the facts that Pr{X, = o(r,)} = 
1 = @(X, = o(r,)) and that Pr{X, = O(r,)} = 1 => O(X, = O(r,)). 

Theorems 7 and 8 do not restrict the joint distribution of the X, . Thus the 
second sentence of Theorem 8 could be changed to: If the X, have given dis- 
tributions and these distributions are the marginals of some joint distribution 
oi the X, under which Pr(for some a, for all n, ES) = 1, then (for some a, 
for all n, E%). A similar rewording of Theorem 7 is possible. 


7. Further generalizations. Definition 2 and Theorems 1-4 generalize im- 
mediately to the case that n ranges over some entirely arbitrary set. 

A (perhaps more interesting) generalization is to do away with product spaces. 
This permits the consideration of events not defined on the product space, both 
those depending (say) on a further variable as well, and those not defined on the 
whole space, such as max(Y,, --- , Yn) ~Owith X, = (Yi, --- , Y,). Suppose, 
then, for each n, P,, is a probability measure on a Borel field @, of subsets of a 
set } 

DeFtniTIon 3. ®(S) for a subset S of X if, for every « > 0, there exist S, ¢ ®, 
such that P,(S,) 2 1 — eforalln and NS, Cc S. 

Theorems 1’—4’ continue to hold. However, some curious things can neverthe- 
less happen. For instance, ®(.S) for every S in the following case, essentially the 
case that P, is the joint distribution of (X, , X,). 

TuroreM 9. Let @, be a Borel field of subsets of X,, n = 1, 2, ---; 
let X = X%.1 X,. ; let B, be the smallest Borel field including all subsets of X of the 
form {xa:a € Ay, In € An}, Ai € Qi, An € Qn ; let Q be a probability measure on 





GENERAL CONCEPT OF “IN PROBABILITY” 557 


Q; ; let P,, be a probability measure on ®, such that P,{x:x, € Ai} = Q(A:) for 
A; € Q@, . Then @(S) for every S provided only Q is non-atomic. 

Q is essentially the distribution of X; . 

Proor. Choose 7,, such that Q(T.) S ¢« and UT, = X,. Let ze S, if and 
only if x, zg T,. Then P,(S,) = 1 — P,{z:a, ¢ T,} 2 1 — e for all n yet NS, 
is empty. Hence, by Definition 3, @(S) for every S. 

Thus we can now have both @(S) and @(X — S), but by Theorem 2’, only 
if we have @(null set), or equivalently, by Theorem 1’, @(S) for all S. 

Theorem 9 makes it amply clear that its hypothesis does not provide a situa- 
tion in which X,, — X, > 0 is equivalent to @(X, — X:— 0), and the generality 
of Definition 3 does not seem, as one might hope, to permit cases like that of 
Example 4.3 to be handled by Theorem 4 without introduction of new random 
variables like Y, . 

The following question seems to me natural and interesting, but I have not 
pursued it. Suppose X, is a function from X to the real line and @, is the Borel 
field of inverse images under X,, of Borel sets. Suppose P,, is a probability meas- 
ure on @, . What restrictions are required to make X, = 0 and @(X, — 0) 
equivalent? We have seen that it is certainly sufficient that X, be the nth co- 
ordinate of X. 


8. Miscellaneous remarks. 
8.1. A stochastic process X, is called continuous in probability if, for every u, 


X:-X,.>7 0, as t—u. 


The statement @(X, continuous), which might also be read “X, continuous in 
probability,” has no meaning as it stands. If ¢ is to take the place of n in Defini- 
tion 2, then for each t, X; must be a random function of another variable. If 
“continuous” means “‘continuous in ¢,’”’ then a collection of stochastic processes 
is required, one for each n in Definition 2. In both cases X;, refers to something 
more complicated than a simple stochastic process, and the statement @(X, 
continuous) is therefore, however interpreted, quite different from the statement 
that (the stochastic process) X; is continuous in probability. 

8.2. I have not been able to see that the point of view explored in this paper 
throws any light on certain problems which concern convergence in probability 
and involve pairwise or joint distributions. For example, a stochastic process 
which is continuous in probability on a finite closed interval is uniformly con- 
tinuous in probability thereon, that is for every positive « and », for some 4, 
Pr{| X, — X.| S n} 21 — cif |t — ul Ss 6 (see Lévy, [4], pp. 36-37). Again, 
if X, — X,, > 0 as n, m — ~, then there is a random variable Y such that 
X, > Y (see Halmos, [3], p. 93). Again, X,, > 0 if and only if every subsequence 
has a subsequence which — 0 with probability one. Pairwise distributions are 
involved in the first two cases. The last is interesting in that a condition which 
involves a joint distribution is equivalent to one which does not. All three are 
most easily proved directly from Definition 1. 








558 JOHN W. PRATT 


Acknowledgements. I am indebted to Herman Chernoff who aroused my 
interest in this field when, as a student at Stanford, at his suggestion, I wrote 
up lectures in which he discussed and amplified [5]. 

I am also most grateful to Sudhish Ghurye for very stimulating and helpful 
discussions and suggestions. 


REFERENCES 

[1] H. Caernorr, ‘Large sample theory: Parametric case,’’ Ann. Math. Stat., Vol. 27 
(1956), pp. 1-22. 

[2] H. Cramir, Mathematical Methods of Statistics, Princeton University Press, Princeton, 
1946. 

[3] P. R. Haumos, Measure Theory, D. Van Nostrand, New York, 1950. 

[4] P. Livy, Processus Stochastiques et Mouvement Brownien, Gauthier-Villars, Paris, 
1948. 

[5] H. B. Mann, anv A, Wa xp, ‘‘On stochastic limit and order relationships,’’ Ann. Math. 
Stat., Vol. 14 (1943), pp. 217-226. 

[6] E. Stursxy, “Uber stochastische Asymptoten und Grenzwerte,’’ Metron, Vol. 5 (1925), 
p. 3. 





DENSITIES FOR STOCHASTIC PROCESSES’ 


By Caartorre T. Srrieset’ 
University of California, Berkeley 


1. Introduction and summary. Let {2s(t), 6&2} be a family of stochastic 
processes defined by their finite dimensional distributions; that is, 
{Folx(ti), --- , x(tn)]; 6 € Q} is given for all finite sets of time points 4,, --- , t,. 
A general procedure for treating a statistical problem concerning this family 
has been to solve the problem for the finite dimensional families and then see 
what happens to the solution when limits are taken over suitably selected sets 
of time points. For example, if [x(t,), --- , 2(t,)] is an estimate of @ based on 
the finite dimensional family and it can be shown that the limit 


6[x(ti), --+ , (tm)] > O[x(t)] 


exists in some sense and is independent of the defining set (4, &,---), then 
this limit will usually provide an adequate estimate of 6 for the process. Fre- 
quently the properties of the estimates 6[xz(t:), --- , z(tn)] can be extended to 
6[x(t)]. 

An alternative approach to the problem is proposed by Grenander [1]. He 
introduces the likelihood ratio of two processes P and Q restricted to a finite 
number of time points, shows that it converges to a limit as the number of points 
goes to infinity and that this limit is the density of P with respect to Q if this 
density exists. He uses these results to derive numerous statistical results. 

The only criterion which he gives for the existence of the density is that the 
limit of the likelihood ratio be finite a.s. P. In applying this criterion he must 
always make use of some additional knowledge of the processes such as a.s. 
existence of certain integrals. In Section 2 these results are established very 
simply using the theory of martingales, and a criterion for the existence of the 
density is given which proves convenient in several applications. A condition 
also is given under which a density computed for a countable number of time 
points is valid for the continuous parameter process. Once the existence of the 
density is established standard statistical techniques can be applied directly. 
For example, sufficient statistics can often be found by inspection, or maximum 
likelihood methods can be used. 

Densities for a normal process with continuous covariance and unknown mean 
value function are derived in Section 3. Minimum variance unbiased estimates 
of regression coefficients are obtained. 

In [2] Cameron and Martin consider processes which are obtained from a 
Wiener process by linear transformations. They state conditions under which 


Received November 21, 1957; revised November 7, 1958. 


1 This paper was prepared with the support of the Office of Ordnance Research, U. S. 
Army under Contract DA-04-200-ORD-171. 


2? Now at Lockheed Aircraft Corp., Missiles and Space Division, Palo Alto, California. 
559 





560 CHARLOTTE T. STRIEBEL 


such a process is absolutely continuous with respect to the Wiener process, and 
they give a formula for computing the density. These results can be applied to 
the Ornstein Uhlenbeck process with covariance o’e*'*"' to obtain a family of 
densities for 28c° = constant. In Section 4 this family of densities is derived 
using the methods of Section 2. Using the same techniques it is shown that this 
family of Ornstein Uhlenbeck processes is mutually absolutely continuous. 
The maximum likelihood estimate of the correlation parameter 8 is computed. 


2. Existence and computation of densities. Let X” = %,; X X%. X...bea 
countably dimensional Euclidean space and @” the Borel o-field over this space. 
Denote by @” the subfield of @” consisting of all cylinder sets with bases in 
X, KX XX. X --- X X,. If P is a measure over {X”, @*}, P” will denote this 
measure restricted to @”. 

Lemma 1. Let P and Q be two probability measures on {%X”, @*} such that P” 
is absolutely continuous with respect to Q” forn = 1, 2,--- . Let dP"/dQ” = f”. 
Then {f", @", n = 1, 2, ---} ts a martingale on the probability space {X”, @”, Q}. 

Proor. For m > n, @” > @”, and hence for all A” ¢ @” 


[ rag = Par) = frag. 
Since f” is certainly @”-measurable, it follows that E[f"|@"] = f”. 
THEOREM 1. Under the assumptions of Lemma 1: 
(i) f° ~fas. Q. 


(ii) BE | f" — f| — 0 @f and only if P ts absolutely continuous with respect 
to Q on @”; and then dP/dQ = f a.s. Q. 


(iii) For r 2 1, E(f")’ Te,. 
(iv) Ife, < © for r > 1, then the conditions of (ii) hold. 


Proor. Since £ | f* | = E(f") = 1,n = 1,2, --- , (i) follows from the martin- 
gale convergence theorem. If dP/dQ = f’ exists, then by the argument used in 
the proof of Lemma 1, f’ closes the martingale. Since E(f’) = 1, it follows from 
the martingale closure theorem that f* — f’ in the first mean and a.s. Q. Thus 
by (i) f = f’ as. Q. If Z| f" — f| — 0, again by the closure theorem f closes 
the martingale. That is, E[f | @"] = f" a.s.Q,n = 1,2, --- . Thusfor A” ¢ Ul.4@" 


(1) [ saa= [ sraq = P(A”). 


The extension from U@”" to @” is unique so that (1) also holds over @”. Thus 
dP/dQ = f as. Q. (iii) is a consequence of the convexity property of condi- 
tional expectations, and (iv) follows from the closure theorem. 

The theorems on martingales used for this and the preceding lemma are 
Theorem 4.1 (i), (ii), and (iii) of Doob [3], Chapter 7. In Section 8 of this 
chapter, Doob obtains results of which Lemma 1 and Theorem 2 are easy 
extensions. 





DENSITIES FOR STOCHASTIC PROCESSES 561 


In order to apply these results to find densities for stochastic processes with a 
continuous time parameter, the step must be taken from %* to 9(7'), the space 
of real-valued functions of t. The set 7’ is assumed to be an interval, infinite or 
not, of the real line. In the space 9(7'), let @ be the Borel o-field over cylinder 
sets with bases Borel in finite dimensional subspaces. For some specified set 
D = (t,t, ---), @” and @” will denote the subfields of cylinder sets with bases 
in [ren X, and []?_1 X:, respectively. 

THeoreM 2. Let {X(T ), @, P} and {X(T), @, Q} be two stochastic processes 
such that P is absolutely continuous with respect to Q over @ and the Q process is 
continuous in probability. Then for any set D = (th, t2,--+) dense in T, the 
derivatives dP* /dQ” and dP/dQ coincide a.s. Q. 

Proor. Let g be a version of the derivative with respect to @. Since it is in- 
tegrable, there exists a sequence of simple functions nlx(ts), a(ts), +++] such 
that g, > g as. Q. Let D = (4, h, ---) be an arbitrary dense set in 7’. Since 
the process is continuous in probability for Q, for each t; there exists a sub- 
sequence {t,;} ¢ D such that x(t;;) — 2(t;) a.s. Q. Thus each g, and hence g is 
a.s. Q equal to an @*-measurable function. This implies that g = dP”/dQ” 
a.s. Q. 

Now consider a family {P,, @ ¢Q} of probability measures over {2(7'), @}. 
For a set dense in 7’, define 


ig ni 
dP; + dP} 


= fos’ 


for all 6, & €Q. 


Corotuary 1. If {X(T), @, Pe} ts continuous in probability for each @ « Q, 
then 


dP, 

dP, + dP a.s. Ps + Po. 
A statistic S[x(t)] is pairwise sufficient for the family {P» , 6 ¢ Q} if and only if for 
all 6, & €Q, fee ts a.s. (Ps + Pe) a function of S{[x(t)}. 

This result is immediate from Theorems 2 and 3 and Theorem 2 of Halmos 
and Savage [4]. 

When the processes P, are defined by their finite dimensional distributions, 
fe can easily be computed. If it can be shown that the family is dominated by 
a o-finite measure, then the statistic found in this manner is also sufficient. 


foe — Sow = 


3. Regression parameters for a normal process. Let x(t) be a normal process 
with nonsingular continuous covariance C(u, v) and mean value function 


where the ¢; are known continuous functions and the k; are parameters. Let 
Py be the process for which k, = --- = k, = 0. For each parameter point 





562 CHARLOTTE T. STRIEBEL 


k = (ki, -+-,k,) define dP? /dP¢ = fy. For a fixed set D = (t, h,---) dense 
in T, let 


©"(6;,¢;) = Du 2, (ta) s(t) C(t , ty). 


LEMMA 2. 
Eo{(fr)"] = exp 2) Do ki kj®"(¢i, $3). 
PROOF. 
fo = exp — 52. ¥ {le (te) — m(te)]le(te) — m(te)] — 2(ta)x(te)}C( te  t) 


Eol( fe )"] me [. : | aia ma exp — D> x {[x(ta) - 2m(ta)] 


-[x(tg) — 2m(ty)] — 2m(ta)m(te)} C7'(ta , ts) dx(t) --+ da(tn) 
= exp - z m(t,)m(tg) c(t. ’ ts) = exp > - k; k; 6" (¢; ’ $;). 


From Theorem 1 (iii) Eol(fz)*] T c2 for all k. This implies that 
b"(o; , 63) > (db; , o;) tJ = 1, ee 


THEOREM 3. If 6(¢; , $;) ts finite for all z, 7, and D, then P, is absolutely con- 
tinuous with respect to Py and has exponential density 


(2) exp | k; &(2, di) — ; > ¥ kk; (4; , +) | ’ 


where ®(x, ¢;) ts lim 6"(zx, ¢;) tf it exists and zero otherwise. 

Proor. From Theorem 1(i) f; — fz a.8. Po . By assumption, lim E(fz)” < « 
and hence by Theorem 1(iv) and (ii) dP¢/dP> = f, . This argument holds for 
any countable dense set D in T. Thus, over any @”, depending on a set of dense 
coordinates, Pf; is absolutely continuous with respect to Po. It follows that 
P, is absolutely continuous with respect to Py over @. Continuity of C(u, v) 
and the ¢,(¢) implies continuity in quadratic mean and hence in probability. 
Thus by Theorem 2, dP:i/dPo = fi a.s. Po. 

This shows that, a.s. Py, (2) is independent of D, and hence that (¢, , ¢;) 
is independent of D. It has been pointed out by E. Parzen that this is an im- 
mediate consequence of continuity of C and the ¢;. Thus the assumptions of 
the theorem can be weakened to require $(¢; , ¢;) i, 7 = 1,---, s finite only 
for one dense set D. 

Corouuary 2. The estimates 


(3) k; = LF '(G: , $5)®(2, $3) i= 12,---,8, 
I= 

minimize E[m(t) — m(t)]’ among all unbiased estimates of m(t) for each t  T. 

Proor. The statistic [(7, ¢,), --- , &(z, ¢,)] is sufficient and complete for 





DENSITIES FOR STOCHASTIC PROCESSES 563 


the exponential family (2). This result then is immediate from Theorems | and 
2 of [5). 

For the particular case that x(t) is an Ornstein Uhlenbeck process with 
0s?tsT, 
(4) C(u,v) = oe"! 
and 


(9; , $;) 


1 Rae 
= 3| 604,10) + (T0(7) +3 [ ehoein aera [” eddeo ar]. 


In Theorem 1 of [6] it is shown that the estimates (3) minimize 
T 
[ etnce) — mcyP ae 
0 


among all linear unbiased estimates. Corollary 2 considerably strengthens this 
result. 


4. Correlation parameter in the Ornstein Uhlenbeck process. Let x(t) and 
y(t) be normal processes with mean zero and nonsingular continuous covariances 
Co(u, v) and C,(u, v). As in the preceding section, application of Theorem , 
l(iv) and 2 to show that the y(t) process is absolutely continuous with respect 
to the z(t) process requires the computation of 


E(f")' = J. [Ee | see) du"(x). 


In this expression »” is n-dimensional Lebesgue measure and f(z) is the normal 
density with mean zero and covariance matrix 


Ci (a, B) = C7 (ta, ts) a,B =1,---,n;+=0,1 


with respect to a preassigned denseset D = (t; , tg, ---). If the matrix r(C?)™' — 
(r — 1)(C¢)~ is positive definite, this can easily be computed. 
r—1 


file) ] a ( n / [ (2r)” | Co | }?. ] yn \~1 
x) dp"(xz) = | ————_~_ exp — = 2[r(CP) 
fs $2) du) = J Tey | Cr re oP — 3 OCG 
— (r — 1)(Co)™) 2’ dy"(z) 
r—1 
Co 7 n\~- m\—1)—1/2 
. erp |r(Ct) (rp — 1)(€3)7™. 
It must then be shown that for some r > 1, the limit of this expression is finite 
asn— ©, 
Consider first +(¢) a Wiener process with parameter K and y(t) determined 
by the linear transformation 


Ka(t) = y(t) + 6 | y(s) ds. 





564 CHARLOTTE T. STRIEBEL 


This satisfies the conditions of Cameron and Martin [2] so that the density of 
y(t) with respect to x(t) can be computed from their paper. The process de- 
termined by this transformation is y(t) = 2(t) — e°‘z(0), where z(t) is an 
Ornstein Uhlenbeck process (4) with parameters 8 and o = K/28. The meas- 
ures on Xr for these processes will be denoted by Px and Ps,¢ . Ordering time 
points of the form 17/2”, i = 1,---, 2” in the obvious way and extracting 
appropriate subsequences, it is sufficient to consider (5) fort; = ir; = 1, --- ,n; 
rt = T/n and n-— ~. Then for the processes just described 


r(Ct)™* — (r — 1)(Co)* 
(ate) —e* 0<----------- ------ 


—" (te) ex ™ 
“ ~ 


a 2pr 
~ Ki — «*) 


eoon---------O 
‘ 
4 
/ 
I 
& 


For the case of two Ornstein Uhlenbeck processes Ps,,., and Ps,,., with pa- 
rameters fo , 79 = K/28o and 6; , oi = K/2; respectively, 


r(C?)* — (r — 1)(Cs)" 


—fir 
1 —e"! Pcorpocr anon ' 
a a ee ' 
—er" 14+ or —e?" he ' 
_ ~. ~~ ‘ 
r at “s6 ~~. “oJ 
= _ rb, 0. —e" Pe ili 0 
Kd — eo) pO a 
i — és. ~ ~~ ee 
; “— ns 1 + e€ Pir é var 
' oe a 
Sncinerepuenavidbiieshive ae wg Fi" 1 
1 -—e Pe een oy 
me ' 
—for —2for ee ie, , 
2(r— 1) |e Lew ° 


Kil and e~2Po7) 0 —eu?o Ty + e 80" ~=>—-For 


oe 
"—. 
--~ 


- 
m=. 
en 
Re 





DENSITIES FOR STOCHASTIC PROCESSES 


In both cases all principal minors have the form 


The determinant of such a matrix of order i 2 3 is given by 


(oe —09)(Qp— a 9 +38) /(Q) -9) 
+3 (1 + re*)(axr ty 
HOT --@ 


In the first case take r = 2 and in the second case take r > 1 and satisfying 
(B:/Bo)* > (r — 1)/r. Then for r sufficiently small it can be shown that 0 < —\ < 
7/2 and a, y > 7/2. These inequalities imply 


where 


ay—-r>0 


2 
277 — ax* — n'y + 4 > —hay — ad! — dy — 2 


= (-A)(a+A)(y¥ +A) > 0, 


and hence that for r sufficiently small the matrices in question are positive defi- 
nite. It is easily verified that minors for 7 = 1, 2 are positive. Thus (5) is valid, 
and using the formula above for the determinants involved, a routine computa- 
tion shows that the limit of (5) as r — 0 is finite in both cases. 

Thus it has been established that in each case, fi (x)/fo(x) converges a.s. to 
the desired density. The form of this limit is easily seen to be ce” where c is a 
constant and FY is the a.s. limit of 32[(C?)~* — (C?)~Ja’. In the two cases con- 
sidered this limit can be expressed in a more convenient form by expanding 
the terms of the sequence as follows: 


$2[(CP) — (Ch) Ix’ = A(r)23 + B(r)zt + O(r) Date 


+ D(r) d (xi — 2X4). 


It has been shown [7] for the two processes concerned—the Wiener process and 
the Ornstein Uhlenbeck process—that 


n T n 
are | ridt and >. (23 —2;%31)->K 
0 i=l 


i=] 








566 CHARLOTTE T. STRIEBEL 


where convergence is in quadratic mean. If subsequences are selected so that 
convergence is a.s., then 


T 
Y = Ast + Ber + [ 22 dt + DK 
0 


where A(r) — A, etc. Routine computation of the coefficients then gives the 
final results. 





dP. _ oe 2 ¢ ‘f° 2 | 
~~? ex aK Ee KT) +8 , dt 
dP oy 6, 


=m Bi A Le a 2 ce 2 a2 T . 7 
pag 4/ "exp | (Bs Bo) (xo + ar — KT) + (Bi é) | a dt | 


for 20°8 = 2018; = 20%) = K 
From the second density, it is seen that 


e 
| (x3 + zz), | x} at| 
0 
is a sufficient statistic for 8. However, since this family is not complete no mini- 


mum variance unbiased estimate of 8 can be found as in the previous section. 
The maximum likelihood estimate of 8 is given by 


r ; 
—(a + 2b — Kr) + (G+ 25 — KT) + 8K [ ai dt) 
sik aio : See 
4 ; dt 
fa 


For T large, this is approximated by 











K 
§~—,— 
2 7 a; dt 
and for T small by 


p~* 


a+ ar 
If the continuous process is observed K can always be considered as known since 
¥ (2! — 2%) ~K 
in probability as r — 0. 
5. Acknowledgments. I am indebted to Professor David Blackwell for pro- 


posing the problem and to Professor Lucien LeCam for many valuable sug- 
gestions. 





DENSITIES FOR STOCHASTIC PROCESSES 


REFERENCES 


{1] Uy GreNANDER, ‘‘Stochastic processes and statistical inference,’’ Arkiv Fér Matema- 
tik, Vol. 1 (1950), pp. 195-277. 
{2} R. H. Cameron anv W. T. Martin, ‘Transformation of Wiener integrals under a gen- 


eral class of linear transformations,’’ Trans. Am. Math. Soc., Vol. 58 (1945), pp. 
184-219. 


(3] J. L. Doos, Stochastic Processes, John Wiley and Sons, Inc., New York, 1953. 

[4] P. R. Hautmos anp L. J. Savaae, “‘Applications of the Radon-Nikodym theorem to the 
theory of sufficient statistics,’? Ann. Math. Stat., Vol. 20 (1949), pp. 225-241. 

[5] Davin BLackwELL, ‘Conditional expectation and unbiased sequential estimation,’’ 
Ann. Math. Stat., Vol. 18 (1947), pp. 105-110. 

[6] H. B. Mann anp P. B. Moranpa, ‘‘On the efficiency of the least square estimates of 
parameters in the Ornstein-Uhlenbeck process,’’ Sankhyd, Vol. 13 (1954), pp. 
351-358. 

[7] H. B. Mann, “Introduction to the theory of stochastic processes depending on a con- 


tinuous parameter,’’ National Bureau of Standards Applied Mathematics Series, 
No. 24, 1953. 





THE SUPREMUM AND INFIMUM OF THE POISSON PROCESS’ 


By Ronatp PyKE 
Columbia University 


1. Introduction. Let { X(t); t 2 0} be a separable Poisson process with shift 
such that 


(1) log E(e**”) = —itwa + dAt(e"* —1) 
for all real w, and a, \ > 0. Set 
o(z,T) = P[ sup X(t) Ss 2]. 


OstsT 
The task of obtaining o(2z, T) explicitly for general stochastic processes is in- 
trinsically difficult. However, Baxter and Donsker [1], following the methods 
and results of Spitzer, have obtained the double Laplace transform of o(z, T) 
for processes with stationary and independent increments. Their result as it 
pertains to the Poisson process is as follows. 
Txeorem. Let { X(t); ¢ = 0} be a separable process satisfying X(0) = 0a.s and 


log E(e"*) = tH(w) 


for all t = O where exp (¥(w)) is the Lévy-Khintchine representation of the char- 
acteristic function of an infinitely divisible distribution. If y(w) is complex and 
for some 6 > 0, 


3 
[ |e) | as < oo , 
s| w 


then for all u, v 2 0, 


uf [ 8 a, 0(z, 7) aT 
0 sae 
(2) (w) \ 
1 / , v V(w 
= exp, — ns geen, Cee ee 
P| oe s [. w(w — ww) s[s — ¥(w)] ” = 
Theoretically, therefore, to obtain ¢(z, T) explicitly, one should evaluate the 
double integral on the right hand side of (2) and then perform a double inversion 
on it. For most cases this is virtually impossible except by numerical methods. 
Baxter and Donsker, however, have evaluated the right hand side of (2) for 
several important cases. Moreover, for the Gaussian process and for the process 
determined by coin tossing at random times, they were able to make the in- 
versions. 


Received June 27, 1958; revised December 10, 1958. 

1 This work was sponsored in part by the Office of Naval Research while the author was 
at Stanford University. Reproduction in whole or in part is permitted for any purpose of 
the United States Government. 


568 





SUP AND INF OF POISSON PROCESS 
For the Poisson process, Baxter and Donsker showed that 
(3) uf [ oP" a,o(z,7) aT = (1 —0/ sll — vio) / at 
fo Jo— 


where 8, satisfies dm(s,) 2 0 and 
u= as, +(e" — 1). 


From (3) they obtained o(z, +). It is the purpose of this paper to obtain 
a(x, T) for finite T which is done in Section 3. In Section 4, the corresponding 
equation to (3), as well as the exact distribution, for the infimum of this process 
are derived. Applications of these results to queueing theory are given in Section 5. 
First of all, a lemma with applications to the theory of distribution-free statistics, 
is proven. 


2. A lemma. Let X,;, X2,---- , X, be independent random variables on a 
common measure space (Q, @, P) such that P[X; S z] = zforall0O Sz <1. 
Define U; as the jth smallest component of (X,, X2,--- , X,). Therefore U; 
is well defined a.s. Define for all real z, a and integral n 


' F(zta,n) = P[ max (at — U;) S 2). 
lsisn 


Lemma 1. ForO Sa51,0S na—2z <1, 


[z/a) 


(4) F(zia,n) = (1 +2 — na) 2 (") (ja — 2)*(1 + 2 — ja)” 


where [y], the greatest integer contained in y, is a left continuous function. When 
na — x2 lorna — x < 0, F(zx:a, n) is equal to 0 or 1 respectively. 

Proor: It may be shown that whenever 0 S na — x < 1, F(z:a, n) has the 
representation 


1 2n 2k+1 tk 22 
F(x:a, n) = n! i] dz, dzn-1 ee | dz, [ dzy—1 eo" | dz; 
na—z (n—l)a—z ke 0 0 


a—z 


where k = [z/a] + 1. That this multiple integral reduces to the expression (4) 
may be shown by a generalization of the method of Birnbaum and Tingey [7] 
along with an application of Lemma 1 of [2]. The evaluation of an essentially 
equivalent multiple integral has already been given by Chapman in equation 
(3) of [9].2 Lemma 1 has also been obtained using probabilistic methods by 
Dempster [10]. 


3. The derivation of o(x, T). Let {Y(t), t 2 0} be a Poisson process with 
parameter A > 0; that is 
E{e'*"""} oil Say, 


?T am grateful to the referee for this reference. 





570 RONALD PYKE 


Write X(t) = Y(t) — at. Then 
o(z, T) = P{ sup X(t) Sz] = P[Y(t) S at +2;0 St S TI. 
stsT 


It is well known (cf. [3] Chap. VIII) that the conditional distribution of the 
first n discontinuity points of the Poisson process given that there were n such 
points in (0, 7'), is the distribution of (7U,, TU:, --- , TU,) where the U,’s 
are as defined above. Using this fact, one may write 


{aT+z] 


o(z,T) = >> Plmax (i/aT—U,) <s ere. 


n=0 lsisn a 
Evaluating the summands by means of Lemma 1 gives 


{z] _—AT n (aT+2] n 
o(z,T) = >, © _ ty’ zg Qty" (aT + x — n)(aT)™ 


n=0 n! n=[z]+1 n! 
7 (") (j — 2)(aP + 2 — 7)". 
j=0 \J 
By Lemma 2 of [2], (5) may be rewritten to give 


THEOREM 1. Let {Y(t), ¢ = 0} be a Poisson process with parameter \ > 0. 
Then for alla, x > 0 


(aT+z] n 
o(z, T) wr ot - (aT + z = wns) 
n=0 : 


(6) ; 
. ~ (") (j —2)'(aT +x —j)"* 


where (") = Oforj > n. 
Corotuary: For all T > 0, a > 0, 


{aT] n 
o(0,T) =e" x a) (l1—n/aT). 


The limiting case of o(x, + ©) may be obtained from Theorem 1 by an appli- 
cation of the Central Limit Theorem for Poisson variables. More specifically, 
by the Central Limit Theorem, for 7 = [z], 


{aT+z] . n—j—l 
: -ar {AT + (x —j)A/ a} 
a yy is - yl (aT + (x “ n)r / a} 








_ f(l—A/a) gue". Mae, 
~ \0, ifX = a. 


Therefore, from Theorem 1, for r = 0 


{z] if: j 
(7) a(x, + oo ) = (1-2/2) & (*) (j — x) eg tele 


a 





j! 





SUP AND INF OF POISSON PROCESS 571 


when \ < a and is equal to zero otherwise. This formula disagrees with (4.15) 
of [1]. For z = 0, (7) becomes 


_ jl—A/a, if \/a <1, 
AO +9)5 . otherwise. 


The expression (7) has also been obtained by Breakwell [4] who has computed 
a constant multiple of (7) for several values of the parameters. 


4. The infimum of { X(t), t = 0}. It is also of interest to study the infimum 
of the deviations of the Poisson process about the line at. Since 


inf X(t) = —sup {— X(t)}, 
O<tsT 


OstsT 


a double generating function, @(u, v) say, may be obtained for the infimum by 
the same methods as used for the supremum by Baxter and Donsker [1]. It 
follows from (2) that for the infimum 


7 ¥(w) 
u?(u,v) = exp {a fi {o = 7 i) qty de ash. 


By an application of Rouché’s theorem, it may be shown that for all s > 0, 
¥(z) — s has as many roots in the upper half plane, |z} => 0, ash(z) = iaz + 
\ + 8, namely one. Denote this root by iy, ; that is 


¥(ty.) = s = ay, + A(e”* — 1). 


Since iy, is also a root, its uniqueness implies that y, is real. Moreover the root 
is simple. Straightforward integration in the upper half plane yields 


1 ore v Ww) ag oy 
« w(w + tv) sis — ¥(w)] ~~ ylYe + v) ds” 
Consequently, 


ub(u,v) = (l+v0/yu)™ 


Although unable to make a double inversion of ¢, one may show that 


lim ug(u,v) = (1 +0/ yo)”. 
u~0 


Under the assumption a < X, yo is the unique positive root of the real function 
ay + A(e” — 1). When a 2 X, yo = O and the above limit is defined to be zero. 
It follows, in particular, that for a < \, the infimum over [0, ~) of X(t) has 
an exponential distribution with parameter yo. 

In the following, the explicit distribution function of the infimum over finite 
intervals of the Poisson process is derived. Moreover, an expression for the dis- 
tribution function of the infimum over [0, © ) is obtained. It is shown by a second 
method that this latter distribution is exponential. The advantage of the second 
method is that the parameter yo as defined above is obtained explicitly. 





572 RONALD PYKE 
For xz S 0 set 
u(x, T) = P{ | inf X(t) Sz] =1-—P(Y(t) >at+2,0 st T] 
stsT 


for all T = O. It is clear that whenever aT + x < 0, u(z, JT) = 0. Suppose 
aT + x 2 0. Then by a similar argument to that used in Section 3, 


1—yw(z,T)= > er = be 1,2, --- ,n| 


n=K+1 


ll 


n=K+1 lsisn aT 


y= = P| max (i/aT —U,) ¢ — — | 
where K = [a7 + z], since the distribution functions of (U,, U:, --- , Un) and 


(1 — U,,1 — Una,+++,1 — U;) are the same. Therefore, by Lemma 1, one 
obtains 


u(z,T) =1+2 Pa or Oa)" 
= ") ee ee Pe a, 
(8) - ye or tal Powe : (n/ay'(r = a) 
. i : wer ee eon 


It may be shown, by rearranging the summations of the third term in (8) and 
by applying Lemma 2 of [2], that the sum of the first and third terms is zero. 
Therefore one obtains’ 

THEOREM 2: For all x < 0, a, T = O, the distribution function of inf X(t), 


OstsT 
u(x, T), ts given by 


p(x, T) = —27 > eh r-a)/a (d/a)"(r — 2) 


r! 


As a consequence of this Theorem, it follows that 





(9) p(z,+ ©) = —2z > ge (p27) niet 


r 


since all the summands are positive. This distribution shall be shown to be 
exponential over (— ©, 0]. Specifically, we shall prove 


From discussions with Joseph M. Gani, it was learned that a result equivalent to 
Theorem 2, namely the first-passage-time distribution, has been obtained in Dam Storage 
theory (cf. [11]). The author is indebted to Gani for giving him access to the proofs of [11], 
without which the above simplification of (8) would not have been attempted. 





SUP AND INF OF POISSON PROCESS 


TxHEorEM 3: For allz 3 0,A >a>Q0, 
(10) log u(z, +) = tha {1 — w(—1, +@)}. 
It will be convenient to define for all real z and 6 < ¢* 


(11) f(x, B) = 2D +a)". 


Since f(z, 8) is a power series in 8 we have 
aS ge te n-jtyr 


As a consequence of Lemma 2 of [2], the inner summation of (12) is equal to 
(n + 2)*"*/n!, and so 


f(z, 8) f(y, B) = f(x + y, B) 


Since f(z, 8) isa continuous function of xz when f(0, 8) is defined to equal 1, it is 
known that 


(13) f(z, 8) = 


for some function g(8) independent of z. For a fixed 8 < e', g(8) may be ob- 
tained by differentiation w-r-t-z and taking limits as x — 0. In this way one 
obtains 


(14) (8) = lim © f(2,8) = Af (1,8). 


Setting v = \/a and 6 = ve” in (9) and (11) gives 
u(z, +0) = e* f(—zxz, ve”) = exp {av — azve” f(1, ve”)} 


exp {vz — vz p(—1, +~)}, 
which is the desired result. 
Upon setting z = —1 in (10), one obtains 


log »(—1, +) = —vjl — »(—1, +@)}, 


or equivalently, one has shown that v{1 — u(—1, + )} is a non-negative root 
of the equation 


(15) A(e*— 1) + az = 0. 


That is, in the notation of (3), vf{l — w(—1, +@)} = 8. 

The function u(x, +) is a special case of the ruin function studied in the 
theory of collective risk which has already been shown to have an exponential 
form e** where R is the unique solution of (15), (ef. Cramér [5]). ‘Che result for 
a < \ contained in Theorem 3 is new in that it gives an explicit expression for R. 





574 RONALD PYKE 


Upon observing that the only non-negative root of (15) when A < a is zero 
itself, we may in summary state that 


(1 ifa =X 
u(x, +o) = oa 
lexp {rAa [1 — u(—1, +)]}} if a < X. 
5. Applications to queueing theory. Suppose that customers are arriving at 
times n/a, n = 0, 1, 2, --- and that the service time for the jth customer, S; 


say, is exponentially distributed with expectation 1/A. It is of importance in 
queueing theory to determine the distribution of the busy period of the server 
under the initial condition that there were k people in the queue. To this end 
one must compute 


B(T | k) = Pi{server is busy throughout (0, 7] | k people in the queue at ¢ = 0]. 
Since additional customers are arriving at times n/a, n = 1, 2, --- we have 
B(T | k) = P(Si + S2+ --- + Sinn 2 (6+ 1)/a,0 Si S aT] 

P{(Y(t) Sat+k—-—1,0S1tS T], 


where Y(t) is a Poisson process with parameter A}. Therefore, B(T|k) = 
o(k — 1, T’). Define 7; as the time until the server is free under the condition 
that there are k = 1 in line at time t = 0. Let G, be the distribution function 
of T, . Then clearly 7, > 0 a.s. and for all ¢ = 0, 


G.(t) = 1 —B(t|k) = 1 —o(k — 1, t) 


which may then be evaluated by Theorem 1. In particular 7; represents the 
total busy period of the server and its distribution function is G,(t) = 1 —o(0, t) 
which is given by the Corollary to Theorem 1. 

A second application is of Theorem 2 to the queueing model in which the 
service times are constant and equal to 1/a and the times between arrivals are 
independent random variables distributed exponentially with expectation 1/). 
Let ¢; denote the arrival time of the 7th person after ¢ = 0. As in the above let 
T, denote the time until the server is free measured from ¢ = 0 when it is assumed 
that at ¢ = 0, there are k people in the queue and the server is just beginning 
service. Thus if G, denotes the distribution function of T, , G.(t) = 0 for t < 0, 
and for t = 0, 


ll 


G.(t) =1 -Plus teat in 1, 2, Me] 


where N, is the number of customers arriving in (0, t]. Therefore, 


G.(t) = 1 —-P{Y(u) 2 au—k,O Susi = u(-k, b), 


which may be evaluated by Theorem 2. In particular 7, represents the total 
busy period of the server and its distribution function is given by u(—1, ¢). 
Of special interest is 1 — G,(+- © ), which is the probability of the server being 





SUP AND INF OF POISSON PROCESS 575 


busy for an infinite length of time, or equivalently, of there always being a 
waiting line. Since 1 — G,(+0) = 1 —y(—1, +) we have by Section 4, 
that 1 — G,(+ 0) = 0 whenever a 2 X, and 


1-G(+0) =1-e™* SMF) en 
j=0 : 


whenever a < A. That G,(-) is a proper distribution function only when a = A 
is in keeping with the known result that the recurrent event, “the server is not 
busy” is ergodic whenever a > \, null recurrent whenever a = \ and transient 
whenever a < X (ef. Lindley [6}). 


6. Applications of Lemma 1. The result given by Lemma 1 is of use in the 
theory of distribution-free statistics. The special case of (4) with a = 1/n is 
the distribution function of the D{-statistic. This special case is known, having 
been obtained by several authors (see e.g. [7]). 

A slightly modified version of the D}-statistic is 


1 
max (Spa 7 4) = Ct 
This statistic has the same asymtotic properties as Di as well as some desirable 
small sample properties. For example E(i /(n + 1) — U;) = 0 for all ¢. More- 
over, setting U4: = 1, Uo = 0 one may write 
1 


W,; = —~— -—- U;+ Ui, a7=1,2,---,n+1, 
n 1 


i= 2, Ws, j=1,2,---,n+1. 
It is known that the random variables (Wi, --- , W,) are symmetrically de- 
pendent. Therefore by a result of Andersen [8] 


ee # — z oe = “ee 
p|cs= 4 u;| Sei j = 90,1, ,n. 


That is to say, the probability that the maximum should occur at the jth ob- 
servation is independent of j. This is not true for D} as was shown in [2]. The 
distribution function of Cy, i.e., P[Ck < 2], is given by F(x:1/(n + 1), n) for 
xz = — 1/(n + 1) and is equal to zero for x < — 1/(n + 1). 

Lemma 1 may also be used to obtain the power of the Dz or C>, tests against 
alternatives of the form G,.(x) = cx for all z ¢ [0, 1/c]. That is to say one may 
obtain, for example, the power of Dt against G, , namely 
(16) P{max (i/n — Z;) S x] = P{max (i / cn — U;) S a/c}. 

isisn lsisgn 
The latter probability may be evaluated by Lemma 1 and is equal to the first 
probability, where Z; is the ith smallest component of (Vi, --- , Va) in which 





576 RONALD PYKE 


the V,’s are mutually independent random variables with the common distri- 
bution function G, . Similarly the power of Dt or Cf against alternatives of the 
form G,,.(z) = b + cx for all x ¢ (0, 1/e — b/c] and = 0 for x < 0, may be ex- 
pressed as a sum of a finite number of terms of the form (16). This generaliza- 
tion of (16) has recently been studied by Chapman [9] for the case b + c = 1. 


REFERENCES 


{1] G. Baxter anv M. D. Donsxer, “On the distribution of the supremum functional for 
processes with stationary independent increments,’’ Trans. Amer. Math. Soc., 
Vol. 85 (1957), pp. 73-87. 
(2] Z. W. Brrnpaum anv R. Pyke, “On some distributions related to the statistic Dt ,”’ 
Ann. Math. Stat., Vol. 28 (1957), pp.179-187. 
(3] J. L. Doos, Stochastic Processes, John Wiley and Sons, New York, 1953. 
(4) J. V. Breakwe%., ‘Minimax tests for the parameter of a Poisson process,’’ unpub- 
lished report. cf. Abstract, Ann. Math. Stat., Vol. 26, (1955), p. 768. 
(5] H. Cramér, “On some questions connected with mathematical risk,’’ Univ. Calif. 
Pub. in Stat., Vol. 2 (1954), pp. 99-124. 
{6] D. V. Linpuey, ‘“‘Theory of queues with a single server,’’ Proc. Cambridge Philos. Soc.., 
Vol. 48 (1952), pp. 277-289. 
(7] Z. W. Brrnpaum anp F. H. Tincer, ‘‘One sided confidence contours for probability 
distribution functions,’’ Ann. Math. Stat., Vol. 22 (1951), pp. 592-596. 
(8] E. 8. Anpgersmn, ‘‘On the fluctuations of sums of random variables,’’ Math. Scand., 
Vol. 1 (1953), pp. 263-285. 
(9] D. G. Cuapman, ‘‘A comparative study of several one-sided goodness-of-fit tests,’’ 
Ann. Math. Stat., Vol. 29 (1958), pp. 655-674. 
{10} A. P. Dempsrer, ‘‘Generalized Dt statistics,” Ann. Math. Stat., Vol. 30 (1959) pp. 593- 
597. 
{11] J. Gant anp N. U. Prasuu, ‘‘The time-dependent solution for a storage model with 
Poisson input,’’ To appear in J. Math. and Mechanics. 





NOTES 


A NOTE ON A CLASS OF PROBLEMS IN ‘NORMAL’ 
MULTIVARIATE ANALYSIS OF VARIANCE! 


By S. N. Roy anp J. Roy 
University of North Carolina 


Summary. Let the columns of X(p X n) be independent non-singular p-dimen- 
sional normal variates with a common variance-covariance matrix and ex- 
pectations given by 

&X’ = Af, 
where A(n X m) is a matrix of known constants and £(m X p) is a matrix of 
unknown parameters. This will be called the model. Under this model consider 
the hypothesis 
KH: = Bn, 
where B(m X k) is a given matrix of constants and »(k X p) is a matrix of 
unknown parameters. 

It is shown that the hypothesis 5C is “completely testable’’ if and only if 

rank A + rank B — rank AB = m. 
Further, if rank A S n — p, it is always possible to construct a testable hy- 
pothesis 3¢* which is implied by %; the test-criterion proposed for 3* is based 
on the latent roots of the matrix S,S;' where S, and (S, + S:) are the “error- 
matrices of sums of squares and products” under the model and under X, re- 


spectively. It is further shown that the rank of the matrix S, is min [p, rank 
A — rank (AB)}. 

Let X(p X n) be a matrix of random variables, the columns of which are 
independent p-dimensional normal variates with the same positive-definite 
variance-covariance matrix =(p X p) and with expectations given by 


(1) 6X’ = Af, 


where A(n X m) is a matrix of known constants and &(m X p) is a matrix of 
unknown parameters. 

Let the rank of the matrix A be r. We shall assume that r S$ min (m,n — p). 
Without loss of generality, the first r columns of A may be taken to be linearly 
independent and so to form a basis of A. Then [2] we can partition and factorize 
A in the form: 


(2) A = [Ai(n Xr) : As(n XK (m — r))] 
- = L'(n X r)[Ti(r Xr) : Tr(r X (m — r))I, 


Received December 3, 1957. 

1 This research was supported by the United States Air Force through the Air Force 
Office of Scientific Research of the Air Research and Development Command, under Con- 
tract No. AF 18(600)-83. Reproduction in whole or in part is permitted for any purpose of 
the United States Government. 


577 

























578 S. N. ROY AND J. ROY 


where A; , L and 7; are matrices each of rank r, T; being triangular and L semi- 
orthogonal, that is 


(3) LL’ = I(r Xr). 

It is well known [2] that the error-matrix of sums of squares and products is 
given by 

(4) S, = XEX’, 

where 

(5) E = 1 — A,(AjAi) "Ai = I — LL, 


and that this Z is an n X n matrix of rank n — r. By our assumption that r < 
min (m, n — p), the matrix S, is, a.e., non-singular. 

Consider, now, the hypothesis that the parameters ¢ can be expressed in terms 
of a smaller number of parameters »(k X p) in the form 


5: = Br (k < m) 
where B(m X k) is a given matrix. Under X the expectations are given by 
(6) &X’ = AB». 


Let rank AB = s. Obviously s S min (r, k). Here again, without any loss of 
generality, we can regard the first s columns of AB to be linearly independent. 
The rank of the matrix [7; : 7]B must be s [2] and it can be factorized the same 
way as (2). Thus, 


(7) [T; : T:]B = M'(r X 8)[Ui(s X 8) : Ux(s X (k — 8))], 


where matrices M and U; are each of rank s, U; is triangular and M semi-or- 
thogonal, that is, 


(8) MM’ = I(s X 8). 
We thus have 
(9) AB = (ML)'(U, : U2), 


where LM(n X s) is seen to be semi-orthogonal. Using (5) it immediately fol- 
lows that the error-matrix of sums of squares and products under the hypothe- 
sis is given by 


(10) X Ey X’, 








where 
(11) Ex = I— L'M'ML’, 


and that this Ex is an n X n matrix of rank n — s. 
Let us choose a matrix N((r — s) X r) which is an orthogonal completion 
of M; that is, 


(12) 







NN’ = I((r — 8) X (r — 8)) and NM’ = 0. 





NORMAL MULTIVARIATE ANALYSIS OF VARIANCE 579 


The difference of the error-matrices (10) and (5) is the hypothesis-matrix of 
sums of squares and products, and is given by 
(13) S, = XHX', 
where 
(14) H = Ey — E = L'L — L'M'ML = L'N'NL. 
Using (3) it is easily checked that, 
(15) EH = 0. 
Thus the matrices E and H are orthogonal and S; is, a.e., of rank = 
min (p,r — 8). 

It will now be shown that the matrix S, is the appropriate hypothesis-matrix 
of sums of squares and products for testing a hypothesis 3¢* which is testable 
[2] and will be introduced presently. It will be shown that, in general, the hy- 


potheses 3 and 3* are not identical; though 3X implies 3*, the converse is not 
generally true. 


Let the rank of the matrix B be ¢t. Then we can find a matrix C((m — t) XK m) 
of rank (m — t) such that 


(16) CB = 0. 


Since the row-vectors of C generate the vectorspace completely orthogonal to 
that generated by the column-vectors of B, it follows that, if C* is any other 
matrix such that 


(17) C*B = 0, 

we can factorize C* in the form 

(18) C* = DC. 
Define the matrix C*((r — s) X m) by 

(19) C* = N(Ti : T:), 


with 7, , T: defined by (2) and N defined by (12). Notice that this C* is of 
rank r — s. Then 


C*B = N[T; : T:|B = NM"[U, : U;] = 0, 

because of (12). Thus, for the matrix C*, the relation (17) holds and eonse- 
quently a matrix D exists which satisfies (18). 

It is easily seen that on elimination of » by pre-multiplication by C the hy- 
pothesis 3¢ may be expressed in the equivalent form 
(20) x: CE = 0. 
Pre-multiplication by D gives, 
(21) x* : C*E = 0. 


Note that D is a matrix of the form (r — s) XK (m — t) of rank (r — s). 











580 8S. N. ROY AND J. ROY 


Obviously 3 implies 3* but the converse is not true unless D is a non-singular 
matrix of form (m — t) K (m — t). A necessary and sufficient condition for 
this is that r — s = m — t or, in words, that 


(22) rank (A) + rank (B) — rank (AB) = m. 
Now partition C* in the form C* = [C, : C2], where 

(23) C, = NTi and C, = NT2, 

so that 

(24) C2 = C:Ti"T2. 


Note that (23) is precisely the condition that the hypothesis 3¢* is testable [2]. 
The hypothesis-matrix of sums of squares and products for testing 3¢* (which 
is testable) computed directly from the formula given in [2] turns out to be 


(25) S* = XH*X’, 
where 
H* = A,(A3A1)'Ci[C,(A1A1)~ Ci] °C1(A1A1) At 
= LTCC TsT1) “Ci G71 L (using (2)) 
= L'N’(NN’)“NL’ (using (23)) 
= L'N'NL (using (12) ) 
= H (from (14)). 
Thus, we have 
(26) S* = &. 


An important special case is where we have n > m > k and rank A = m 
and rank B = k. In this case, rank AB = k. Consequently the condition (22) 
is satisfied and the hypotheses % and 3* are identical. 

The statistical criterion for testing the hypothesis #* would be based on the 
latent roots of the matrix S,Sy', the particular critical region proposed here 
[1, 2] being given by 


(27) Cmax{S2Sr'] 2 Ao(t*, r — 8,n — 1), 


where Cimax{S2S7'] denotes the largest characteristic root of the matrix [S_S7'] 
all of whose roots are non-negative and, a.e., (* roots are positive, * = min 
(p, r — 8s), and A,(t*, r — s, mn — r) is a constant, depending upon ¢*, r — s, 
n — rand the size of the critical region a, which can be obtained, since the dis- 
tribution is known and the percentage points are being tabulated. 

If p = 1 we have the univariate problem, in which case (27) is replaced by a 
8-critical region or, after a little transformation, by an F-critical region. 


SCHUMANN-BRADLEY TABLE 


REFERENCES 


[1] Roy, 8. N., “On a heuristic method of test construction and its use in multivariate 
analysis,’’ Ann. Math. Stat., Vol. 24 (1953), pp. 220-238. 

[2] Roy, 8. N., ‘A report on some aspects of multivariate analysis,’’ North Carolina Insti- 
tute of Statistics Mimeograph Series No. 121 (1954). 


Sannin cee 


NOTE ON AN APPLICATION OF THE SCHUMANN-BRADLEY 
TABLE 


Irwin D. J. Bross 


Cornell University Medical College 


Summary. In a recent paper [1] Schumann and Bradley present a table of 
the ratio of central variance ratios for use in comparing the sensitivity of experi- 
ments. The purpose of this note is to point out some other applications of the 
Schumann-Bradley Table I [1] which stem from a variance components model. 


Model. The model employed in [1] is a “fixed effects” analysis of variance 
schema (henceforth abbreviated FEM). In this note a “random effects” model 
(REM) will be used. Let s?; and s?; be the mean squares for “error” and “treat- 
ments” in the ith experiment (7 = 1, 2). Let n,; and n;; be the respective de- 
grees of freedom, let «2; and o7; be the variance components, and let K; be a de- 
sign constant. Finally let x2; and xi; be central chi-squares with n,; and n;; 
degrees of freedom respectively. Then the usual REM is 


(1.01) NeiSei = O2iXei 5 ni8ti = (oes + Kyois)xis - 


Let F; be the ratio of the mean squares and let F,; be a central F ratio with 
degrees of freedom corresponding to F; (i.e., ni, ni). Then for independent 
mean squares, 


2 2 
(1.02) F; = (e + Keen) Pr... 


2 
Cei 


Let w be the ratio of F; to F2, let w, be the ratio of central variance ratios 
(i.e., Pa/F.2), and let 


2 2 
» Fe2 
(1.08) “ca 
then 

(1.04) w= yw.. 


For the special case (Co) where the two experiments have the same structure 
(i.e., Na = Ne, Mn = Nea, K, = Kz) equation (1.04) leads immediately to 
exact significance tests based on Schumann-Bradley Table I [1]. In the nota- 


Received February 4, 1958. 





582 IRWIN D. J. BROSS 


tion of that table nx = ne = 2b, nn = Ne = 2a, and the tabular quantity, 
wo , is such that: 


(1.05) P(w. < wo) = 0.95. 
Since 

P(w < Pwo) = P(e. < yw) = P(w, < wo) 
and since for the null hypothesis 

Hy: on / on = ots / oes 

the value of y is unity, it follows that: 
(1.06) P(w < wo| Ho, Co) = 0.95. 
Equation (1.06) provides a one-tailed significance test at the 5% level. A two- 
tailed test at the 10% level is obtained by using the fact that 

P(w < wo| Ho, Co) = P(1/wo < w| Ho, Co). 


Corresponding confidence intervals on y are readily obtained. 
In the event that the two experiments do not have the same structure there 
are both theoretical and practical difficulties. For K, # K, the value of y under 
Hy depends on a nuisance parameter. If nu ~ ne or na ~ Ne then Table I [1] 
cannot be used and it is unlikely that a corresponding table will be worked out 
since it would be a quadruple entry table. 
An approximate test in the general case can be obtained by letting 
a on + Ki ois + oe +, Ke o%2 a. 


> 2 2 
Te Fre Oe oe2 


9 
Tt Fe2 


An algebraic identity for y is 


— K. K, 
(1.07) y= Ai ou + Ae s 

K, @+ Kop 
The parameter » may be estimated by F; + F, — 1 and, under Hy, ¢ = 1 so 
that a numerical estimate of ¥, Yo , may be calculated. If the confidence interval 
on ¥ does not include Y tle null hypothesis would be rejected. 


Applications. Hypotheses about @ may be of interest in genetic experiments 
and in the choice of sampling plans [2]. The REM model may also apply in 
some comparisons of the sensitivities of experiments—for example in situations 
where different sets of treatments are used in the two experiments but both sets 
are regarded as random samples from the same population of treatments. 

The distinction between FEM and REM tests is important here because they 
can lead to different conclusions. As a numerical illustration of this point con- 
sider the data on the rating of canned tomatoes by two judges used by Schu- 
mann and Bradley in [3]. For this data n,; = 24, ni; = 8, and w = 3.53. Using 
Table I [1] the critical value, wo , is 4.12 for the REM test so that w is not sig- 





ON A THEOREM OF LEVY-RAIKOV 583 


nificant at the 5% level. For the FEM test recommended in [3] the critical 
value is less than 2.87 so that w is significant. In general the critical values for 
the FEM test will be smaller but there is an “ultraconservative” (3] FEM test 
which is the same as the REM test. 

Whether a FEM or a REM test is appropriate is a tricky question involving 
the context of the study. Thus since the same set of treatments was used in the 
comparison of judges a FEM test would seem to be implied. However, as noted 
in [3], there is a possible judge X treatment interaction to consider. In other 
words the judges might make self-consistent subjective ratings but there might 
be little correlation between the two sets of ratings. If this happened the ratings 
would behave more or less as if two different sets of treatments were used (im- 
plying a REM test). Actually the judge < treatment interaction does not ap- 
pear to be serious so the FEM test seems more appropriate. However, this is an 
issue which must be carefully considered in each specific application. 


REFERENCES 


[1] D. E. W. ScouMANN AND R. A. Brap.ey, ‘‘The comparison of the sensitivities of similar 
experiments: Theory,’’ Ann. Math. Stat., Vol. 28 (1957), p. 902. 

[2] R. L. ANDERSON AND T. Bancrort, Statistical Theory in Research, McGraw-Hill Book 
Co., New York, 1952, p. 313. 

[3] D. E. W. ScouMAnN AND R. A. Brap ey, ‘“The comparison of the sensitivities of similar 
experiments: Applications,’’ Biometrics, Vol. 13 (1957), p. 496. 


a RR 


ON A THEOREM OF LEVY-RAIKOV 


By A. Devrnatz! 
Washington University 

THEOREM OF Livy-Rarkov. If ¢; , $2 are characteristic functions and @ = 9; d2 
is analytic, then so are ¢, and ¢2 and the strips in the complex plane where ¢, and 
¢. may be extended analytically are at least as large as the strip where @ may be 
extended analytically. 

This theorem was originally proved by P. Lévy [2, 3] for the case where ¢ 
may be extended analytically over the entire plane, and by Raikov [5]. Another 
very simple proof may be found in [1]. 

The purpose of this note is to give a sharpened version of this result. 

Tueorem. If ¢; , $2 are characteristic functions and @ = ¢; $2 18 differentiable 
2n times, then so are ¢; and ¢.. For any real a let pa(x) = e"p(x); then there 
exist numbers a; , m; such that 


(1) ios” (0)| < m; |v" (0)|, j=1,2,0Sken. 


If is infinitely differentiable and the Hamburger moment sequence 


Received July 28, 1958. 
1 Research supported by a National Science Foundation grant. 





584 A. DEVINATZ 


{(—1)"¢(0)}% is determined, then so are the Hamburger moment sequences 
{(—i)*of? (OF, j = 1, 2. 
Proor. Let 
$x) = [ dF ;(t), j = 1,2; 
then 
ats) ee I e+ GP. (t) dFx(1). 


We shall suppose that the intervals (— ©, 0] and [0, ~) both have non-zero 
measure with respect to dF; . If this is not the case, then by a suitable choice of a 
real number a, the measure corresponding to the characteristic function e“"¢2(z) 
has this property.” We would then work with the functions va(x), di(z) and 
eon (x). 
Let us set (see [4]) 
Aid(x) = (x +h) — o(2 — h), Aio(x) = Al AL O(z). 
We get 
| on " I sin h(t + af 
——— Ax"¢(0) = (-1 ——_——‘| dF IF 2(r). 
(hy n (0) ) [ h i(t) dF2(r) 
If ¢°” (0) exists, then the left side approaches this value as h — 0 and hence 
the integral on the right is uniformly bounded as h — 0. This gives 


(-1)"9(0) = ff (t+ 2)™ aF(t) aFx(r), 
and from this it follows immediately that 


(2) (—i)*¢”(0) = I/ (t + r)* dF,(t) dF.(r), 0 


lA 


k S 2n. 
If we integrate over a closed rectangle 0 S t, r S a, we get 
ea (t + 7)*dF,(t) dF: (r) < |o(0)!. 
Expanding by the binomial theorem gives 


iim [ aF,(t)[ @aFy(t) < |6™(0)|, 
a>~x 0 

since each term in the binomial expansion is non-negative. By hypothesis, the 
dF, measure of (0, ~) is not zero and hence ff ¢“dF,(t) is bounded by a constant 
times |p’ (0)|. If we repeat this process for the closed rectangle —a < t, r < 0, 
and then repeat the whole process, interchanging the role of ¢; and ¢@ , we get 
the first two statements of the theorem. 





2 We could, in fact, choose a so that the corresponding measures of (— ~, 0] and [0, ~) 
are both 2 1/2. As our proof will show, we could then take m; S 2. 





ON A THEOREM OF LEVY-RAIKOV 85 


To prove the last statement of the theorem we shall suppose that ¢ is infinitely 
differentiable, the Hamburger moment sequence 4, = (—i)"¢'”(0) is deter- 
mined, but the moment sequent », = (—7)"¢{"(0) is not determined. By ex- 
panding the integrand in (2) by the binomial formula we see that the left hand 
side remains fixed if dF, is replaced by any solution of the »,-moment problem. 

By (2) it is clear that the unique solution dF of the u,.-moment problem is 


given by 
dF = dG, * G2 9 


where dG, * G, is the convolution of the measures dG, and dG, for any solutions 
of the corresponding moment problems. The characteristic function of dF is 
the product of the characteristic functions of dG, and dG, respectively. If dG, 
is fixed and dG, changes, then the characteristic function of dF must change, 
and hence dF must change, which contradicts the initial hypothesis. Hence the 
assumption that the »,-moment problem is not determined is untenable. 


Coro.uary. The previous theorem includes the Lévy-Raikov theorem. Moreover, 
if un = (—t)"9 (0) and 


2 1/ (won) ™" = 0, 


the same is true for the moments corresponding to ¢; and ¢:. 

Proor. The first statement follows by the formula (1) and the fact that if ¢ 
is analytic so is ¥, for any a. 

To prove the second statement we note that if we set u,(a) = ( —i)"yi" (0), 
then for n even 


Mn a) = > (7) Mn—k a* 


k=0 


pe "al" = [un” + Jal)” 


Hence, 
. \1/2n 
D 1/(un(a))'" = « 
n=( 


and our assertions follow from the inequalities (1). This proves the corollary. 
Finally, we remark that the inequalities in (1) are the best possible in the 
sense that there exist ¢;, 7 = 1, 2 such that (—1%)"95" (0) = (—t)"yi?(0) 


for even n and some real a;. For example, let ¢:(x) = e”, (x) = e& ”. 


REFERENCES 


{1] D. Duevk, “Analyticité et convexité des fonctions caracteristiques, Ann. Inst. Henri 
Poincaré, Vol. 12 (1951), pp. 45-56. 

(2] Pau. L&vy, ‘“‘L’arithmétique des lois de probabilité”, C. R. Acad. Sci. Paris, Vol. 204 
(1937), pp. 80-82. 





586 M. 0. GLASGOW 


[3) Paut Livy, “L’arithmétique des lois de probabilité’’, J. Math. Pures Appl., Vol. 103 
(1938), pp. 17-40. 

[4] Eucene Luxacs anp Otto SzAsz, “On analytic characteristic functions, Pacific J. 
Math., Vol. II, No. 4 (1952), pp. 615-625. 

[5] D. A. Rarxov, “On the decomposition of Gauss and Poisson laws’’, Izvestiya Akad. 
Nauk SSSR Ser. Mat. (1938), pp. 91-124 (Russian). 


I 


NOTE ON THE FACTORIAL MOMENTS OF THE DISTRIBUTION OF 
LOCALLY MAXIMAL ELEMENTS IN A RANDOM SAMPLE 


By M. O. GLascow 
University of Texas 


0. Summary. The results reported by T. Austin, R. Fagen, T. Lehrer, and 
W. Penney [1] are extended to include a general recurrence relation for the 
factorial moments of the distribution. This recurrence relation is solved for the 
mean and second factorial moments, and it is shown that the method applied 
may also be used to obtain a general solution for any desired factorial moment 
of higher order. 


1. Introduction. Austin, Fagen, Lehrer, and Penney [1] have discussed the 
distribution of locally maximal elements in a random sample. Among other 
results, the authors defined certain elements in an ordered random sample of 
n distinct real numbers to be locally k-maximal, provided such an element is the 
greatest of some set of k consecutive elements of the sample. Denoting by 
Jx(n, t) the number of sequences of the first n positive integers which have 
exactly ¢ elements which are locally k-maximal, and defining a generating func- 
tion, v;.(2, y), 


(3.3) u(2,y) = Dd f(a, B)x*y’ ‘a!, 
ap 
a recurrence relation and partial differential equation were then derived: 


(1.2) filn +1,r+ 1) =D (2) Km, ohn — mr — 0, nzk-1 
m,t 


k-2 


(1.3) dv,/dx = yr + (1 — y) > (t+ Ia’. 
t=0 


Unless specified otherwise, the range of a summation variable may be taken as 
(0, + ) in these and the subsequent sums. 

The relations (1.1) and (1.3) may be employed to obtain a general recurrence 
relation for the factorial moments of the distribution. Information on such 
moments would be useful in any application of the distribution as a non-para- 
metric test, and would generally be of value in characterizing the distribution. 


2. Recurrence relation for the factorial moments of the distribution. Let 
the r-th factorial of 8 be defined as 6” = B(8 — 1)---(8 — r + 1), with 


Received July 5, 1958. 





LOCALLY MAXIMAL ELEMENTS 587 


8° = 1, B® = 8B. The expected value of 8°” for n and k held fixed may be de- 
noted as E(8“”, n) such that 


(2.1) E(B? .n) = > Bfi(n, B)/nt. 
8 


With this notation, the defining relation (1.1) for », may be differentiated 
and evaluated at y = 1, with a result 


wae =X (at VeMfila + 1, 8)2"/(a + 1)! 
(2.2) — ™ 
= > (a + 1)E(8", a + 1)2*. 


A different representation of this same function may be obtained by differ- 
entiating (1.3), making use of (1.1), (2.1), and Leibnitz’ expansion for the 
derivative of a product. There results 

a” ay, | a” 
oy Oz let” Oy E + ( y) > ( + )2'| 


y= 


(2+ roa )a+ % =n E (++ ve] 
=-|>2 ae ) B06, 8) B08", 9) + r(’ pl 


-E(3™,3) (3°, »)} tt 9) > (t+ D2] 
oy’ t=0 y= 
The boundary values of the E(8’, n) include the following: 
E(Bp®,n) =1,n=0 
E(8®,n) = 0,n <0 
E(8,n) =0,r> 
E(p,k) =1,r=0,1 
8° ,k) = 0,7 1 
Upon equating the coefficients of a*t® § = —1, in relations (2.2) and (2.3), 
after making some simple substitution changes on the dummy variables, there 
results 


(k+S+ 1)E(8,k + 8+ 1) 


=> I(x) E(p™,k+8S — y)E(s"~ 


u¥ 


+ r(’ 4 ') E(8™, k ae VEG", ») |, r>0,8 





588 M. 0. GLASGOW 


The relation (2.5) is a general recurrence relation for the factorial moments 
of the distribution. Computation of these moments may be expedited by use of 
the recurrence relation. 


3. Solution of recurrence relation. For the special case, r = 1, the relation 
(2.5) reduces to the following: 


(3.1) (kK+S+1)E(8°,k+S4+1) = (kK+S4+1)+2> E(B", k +S — 2). 
t 

This equation has the general solution E(8™, k + S) = (k +28 + 1)/(k +1) 

in agreement with reference [1]. The solution may be proved by induction, or by 


the use of a method we shall employ for the case, r = 2. 
For the special case, r = 2, relation (2.5) reduces to the form 


(k+ S+ 1)E(8°,k +S +1) 
(3.2) => 48(8",k + S — a) +2£(8°,k + S — a) 


+2E(8°,k + S — a)E(B™, a)). 


The last term in the right member contributes only if S 2 k. For this reason 
the cases —1 S S S k — 1 and S 2 k will be considered separately. 
If —1 s S s k — 1, then (3.2) reduces to 


(k+S+1)E(B°,k +S +1) 


(3.3) = > (4E(6",k + S — a) +2E(p?,k + S — a)}. 


On subtracting from this equation the similar equation resulting when S is 
replaced by S — 1, and making use of the general solution for the mean, a 
difference equation results, 


(k+ S+ 1)E(8%,k +841) — (k+ 8 + 2)E(8°,k + 8) 

= 4(k + 28 + 1)/(k +1). 
This difference equation may be solved by making the substitution, 
E(p?,k +r) = (k +r + 1)H(k+ 1), with boundary value H(k)= 0. The 
solution is 


E(p®,k +S +1) 


(3.5) 4k+S+2)< (k + 2t+ 1) . 
Sy enemies Th: semaatiendidamncanarae . <s8skt—1. 
ki) ttt De+IFD’ 9 


(3.4) 


If S => k, set S = k + m, m 2 O. The recurrence relation (3.2) takes the form 


(3.6) (2k + m+ 1)E(B°,2k + m+ 1) = > [4E(8%, 2k + m — a) 


+2E(p, 2k + m — a) +2E(8", 2k + m — a)E(B", «)]. 


On subtracting from this equation the similar equation with m replaced by m — 1 





LOCALLY MAXIMAL ELEMENTS 589 


and making use of the general solution for the mean, the following difference 
equation results: 
(2k + m + 1)E(8, 2k + m + 1) — (2k + m + 2)E(8™, 2k + m) 
4(3k + 2m + 1)/(k +1) +2 > [E(6™, a) E(B, 2k + m — a) 


— E(8™, a)E(6, 2k +m—1-a)] 
4(3k + 4m + 1)/(k +1) +4m(m — 1)/(k + 1)* +2, m = 0. 


This difference equation may be solved by placing E(s”,2k + r) = 
(2k + r + 1)H(2k + 1), with the boundary value, 


4 > k+28+1 
(2k) = BOB, 28/2 +) = GE BETS E HET SED: 


The resulting solution to (3.7) is 
E(p”, 2k + m+1) = (2k + m+ 2) 


(3.8) sm | 403k + 48 + 1) 4805S — 1) 
{xan +] (k + 1) + SP +2] 
Pet Qk+S+1Qk+S+2) j 
for m2 0. 


These results were verified by the use of the numerical examples included in 
reference [1]. For larger values of the arguments, computation of the values of the 
finite sums in the expressions for E(8,r) may be expedited by expanding the 
summands into partial fraction form, resulting in expressions which could be 
evaluated by the use of tables of the logarithmic derivative of the Gamma 
function, according to methods discussed in references [2] and [3]. 

Similarly, a solution of (2.5) for E(8, k + m) for general r may be obtained 
in terms of the factorial moments of all orders Sr — 1. Such a solution is 


E(p,k + m +1) 
(3.9) 


= (k +m +2) | Hc) + 2 Pralk, S\/(k+S+1)(k+S+ 2), 


where 


P,4(k, 8S) = > x (;) E(8°™, y) [E(8™,k + S — y) 


=l ¥ 


— E(6,k+S-—1-)] 
+Ert " E(8°"™, y)[E(8, k + S — y) 


Lae E(s™, k + S — i- v7}, 
A(k) = 1/(k + 1),r = 1; H(k) = 0,r> 1. 











590 LIONEL WEISS 


4. Acknowledgment. I am grateful to Dr. Robert E. Greenwood for his advice 
and guidance. 


REFERENCES 


[1] T. Austin, R. Fagen, T. Lenrer, anp W. Penney, “The distribution of the number of 
locally maximal elements in a random sample,’’ Ann. Math. Stat., Vol. 28 (1957), 
pp. 786-90. 


[2] Tomuinson Fort, Finite Differences and Difference Equations, Clarendon Press, Oxford, 
1948, 


[3] CiarENcE Hupson Ricuarpson, An Introduction to the Calculus of Finite Differences, 
D. Van Nostrand, New York, 1954. 


RR 


THE LIMITING JOINT DISTRIBUTION OF THE LARGEST AND 
SMALLEST SAMPLE SPACINGS' 


By LioneL WEIss 

Cornell University 
1. Introduction and summary. X,, X:,---, X, are independent chance 
variables, each with the same distribution. This common distribution assigns 
all the probability to the closed interval [0, 1], and has a density function f(z) 


whose graph consists of any finite number of horizontal line segments. That is, 
there are H non-degenerate subintervals 


Ii,12,°°:,1a, I; = (0, 2), J2 = fai, 22), -+- , Ja = [2e, 1], 


and for each z in J;, f(x) = a;. We assume that a; is positive for all 7. Let 
zo denote zero, and zy denote unity. M will denote min; a; , B will denote 


z, (2; — 23-1), 


j:0;=—M 
and S shall denote fo f*(x) dx = >-f, aj(z; — zj-1). 
Let Y; S Yo S --- S Y, denote the ordered values of X,,---, X,, and 


define W, = Y,, W2 = Y2 — Yi1,°-:, Wn = Yun — Yn, War = 1 — Va, 
U, = min (W,,--:, Wai), Van = max (Wi, --- , Way). In [1] it is shown 
that if f(x) is the uniform density function over [0, 1], then 


(n + 1)?’ n+ 1 





nre 


. . A ‘ log (n + 1) — logv ; 

lim P| U, > a < g ( 41) = 160] = exp {- (ut), 
for any positive numbers wu, v. It is easy to see that the convergence must be 
uniform over any bounded rectangle in the space of u and v. In this paper it 
is shown that if f(z) is of the type described above, then 


—— a. . log (n + 1) + log M — log | 
lim P| Us> Gear Va < M(n + 1) 





na 


= exp {—(Su + Bv)}, 


Received August 27, 1958; revised December 10, 1958. 
1 Research under contract with the Office of Naval Research. 


LARGEST AND SMALLEST SAMPLE SPACINGS 591 


for any positive values u, v. This result can be used to study the asymptotic 
power of various tests of fit based on U, and V, which have been proposed (see 
[1], p. 253). 


2. Derivation of the distribution. Let N; denote the number of the values 
X,, +--+, X, which fall in the subinterval J; , and let ;¥; S ;¥2 S --- S jY¥n, 
be the ordered values of the observations in J; . Denote z;_; by ;Yo, and 2; by 
iYwji1. Define ;W, as ;Y¥, — jYnu for h = 1,---, Nj + 1. The joint condi- 
tional distribution of 


Wry 
"2; — 2-1 

given N; is the same as the joint distribution of N; + 1 sample spacings created 
by N; independent observations on a uniform distribution over [0, 1], and de- 
pends only on N;. Denote min (;Wi, --- , ;jWwj41) by U(N;), and 

max (;Wi,--- , jWwj41) 
by V(N;). Then we have from the theorem of ({1], p. 252) 
: U(N;) u; V(N;) log (N; + 1) — log» 
lim p| Une > —— -- _— << *' 
2j— 2j+-1 (N; + 1)? 25 — 2j+1 N; + 1 


min Nine 
3 


J H 
eH. Ne |= exp — 2 (uj + wh, 
j=l / 


and this approach is uniform over any bounded subset of m,---, Ua, v%1, ° 
Ug space. Now we set 
2 
ee N;+1 rm 
’ n+1/) 2; — 2-1 


. N;+1 M( I 
log v; = log (N; +1) — Cea as log | 4 t |. 


Then the inequalities in the conditional probability above become 


= 


Uu 


(n + 1) 


log (n + 1) + log M — log» 
M(n + 1) : 


U(N;) > 
V(N;)) < 


(N; + 1) / (n + 1) can be written as a;(z; — 231) + Z;(n), where Z;(n) isa 
chance variable such that n**Z;(n) converges to zero with probability one as 
n increases, for any positive 6. This means that Z;(n) log (n + 1) converges 
to zero with probability one as n increases. Then it is easily verified that as n 
increases, with probability one u; converges to uaj(z; — z;-1) and v; converges 
to v(z; — 2;-1) ifa; = M and to zero if a; > M. We turn our conditional prob- 











592 LIONEL WEISS 


ability above into an unconditional probability by taking the expectation with 
respect to N,, --- , Nz, and we find 





lim P| min U(N;) > 


nreo 


u log (n + 1) + log M — ese) 
warp. TE? 5 Ma +1) 


= exp {— (Su + Bv)}, 
for any positive values u, v. 


The final step in our derivation is to show that with probability approaching 
one as n increases, U,, = min; U(N;) and V, = max; V(N;). If 


U, # min; U(N;), 


it means that for some j, U(N;) = ;W, or ;Ww,4:. But from the symmetry of 
the joint distribution of ;W,, --- , ;Ww,+4:, it is easily seen that 


limps. P(UULN;) = jWi or jWwj4:) = 0. 


Then lim,.. P[{U, = min; U(N;)] = 1. If V. # max; V(N;), it implies that 
the sample spacing of maximum length contains one of the values 2, --- , zz. 
Denote by 77; the length of the sample spacing which contains z;. Simple cal- 
culations show that the limiting distribution of (n + 1)7; has a bounded 
density function. But V, 2 max; V(N;), and the limiting probability given in 
the preceding paragraph shows that (n + 1) max; V(N;) grows without bound 
as n increases. Thus with probability approaching one as n increases, 


V, > max;7;. 


3. The large-sample power of certain tests of fit based on U, and V,. The 
statistics V,/U, and V, — U, have been proposed to test the hypothesis that 
f(x) is the uniform density function. We define U’, as (n + 1)°U,, and Vas 
M(n + 1)V, — log (n + 1). Above we showed that U’,, V’, have a non- 


degenerate joint limiting distribution as n increases. We have 


ME ee Fi e's 
(n+ 1)log(n+1)U, log (n+ 1) U%, U, 
M(n+1)(V, — U,) — log (n+ 1) = Vi - MU. 


(n+ 1)’ 


These relations imply that asymptotically, the test based on V,/U, is equiva- 
lent to the test based on U, alone, and the test based on V, — U, is equivalent 
to the test based on V, alone. Small values of U, are critical, large values of 
V, are critical. Using the results above, we find that the asymptotic critical 
value for U, if the desired level of significance is a is —log (1 — a)/(n + 1)’, 
and the asymptotic power of the test based on U, is 1 — (1 — a)*. Thus the 
test based on U, is not consistent against any alternative of the type we are 
considering. However, it is not difficult to show that the test based on V,, is 
consistent against any such alternative. 


GENERALIZED Dt STATISTICS 


REFERENCE 


[1] D. A. Daruina, “On a class of problems related to the random division of an interval,” 
Ann. Math. Stat., Vol. 24 (1953), pp. 239-253. 


or 


GENERALIZED D;, STATISTICS: 


By A. P. Dempster 
Harvard University 


1. Introduction. The purpose here is to present simplified derivation methods 
which can be applied to generalizations of some distributions derived by Birn- 
baum and Tingey [1] and Birnbaum and Pyke [2]. In the case of [1] the generaliza- 
tion is explicitly written down as equation (5)’. Other authors have noticed this 
generalization; it appears implicitly in equation (31) of Chapman [3] and is given 
explicitly by Pyke [4]. However the derivation given in the following section 
differs from the methods of other authors and gives a probabilistic meaning to 
each term in the summation formula (5)’. In the case of [2] explicit formulas 
are given for a special case of our generalization different from that considered 
by Birnbaum and Pyke. 

Consider a sample of n from the uniform distribution on (0, 1). Denote the 
sample c.d.f. by F,(2). The relevant part of the curve y = F,(z) is entirely 
contained by the closed unit square 0 S x S landO Ss y S 1, and within this 
square the population c.d.f. is represented by the line y = z. For 0 S 6 < 1 
and 0 < « < 1 the line joining (0, 8) and (1 — e, 1) will be referred to as bar- 
rier (6, «). A set of such barriers moving away from y = x may be conceived of, 
and we are concerned with a set of probabilistic questions about which barriers 
are crossed and where by the curve y = F,,(x) as it passes from (1, 1) to (0, 0) 


2. The basic derivation. Denote by f;(0 S j S n — 1) the probability that 
y = F,(«) crosses the barrier (4, €) at level y = (n — j) / n not having crossed 
it at any level y = (n — 7) / nfori < j. Denote the abscissa of the intersection 
of the barrier (4, «) and level y = (n — j) / n by m;. Then it is easily checked 
that 


be rae. 
(1) my = 1 6 i). 


Finally, let us use b(r, s, p) for the binomial probability @) p(l—p)*”. 


An expression for f; may be derived as follows. Given that y = F,(z) passes 


Received May 27, 1958; revised November 24, 1958. 

1 Work done in part at Princeton University while the author was supported by the 
National Research Council of Canada, and in part at Bell Telephone Laboratories while 
the author was a Member of Technical Staff. 





594 A. P. DEMPSTER 
through (m;, 1 — i/n) the conditional probabilities of passing through 
(m;,1—j/n) and (m;, 1 — (j — 1) /n) are respectively 

b(j — i, n — 4, (m; — m;) / m;) 


and b(j — i — 1, n — i, (m; — m;) / m,), for j > i. The unconditional prob- 
abilities of arriving at these 2 points can therefore each be computed in 2 ways as 


j—1 iia 
(2) b(j,n,1 — m;) = ¥ 10(5 Je & + me = ) +f; 
and 
j—1 ao 
(3) b(j — 1,n,1 — mj) = Do fb (3 wig pages g Se m1). 
t=0 mM; 


If equation (3) is multiplied by [(n — 7 + 1) / (j — 7)]l(m; — m;) / m;], 
which factor is independent of i from (1), and (3) is then subtracted from (2), 





then fo, --- , fj. are eliminated and so, 
fi = b(j,n,1 — mj) - Poona a leet ake — 1,n,1 — m;) 
ere mM; 
or after reduction 
(4) fixe (”) (1 — m;)* "mj? 


or 


; bs n 1—¢«j\" 1 of +ag¥" 
ae AL Cay 


If P(n, 6, €) = 1 — Q(n, 4, €) is defined to be the probability that y = F,(2x) 
nowhere crosses the barrier (6, «) then 








ks 
(5) Q(n, 6,6) = Df; 


j=0 


where k; is the largest integer such that k/n < 1 — 6. Thus in full form 


ks \s-1 — 
a pe n i ig oe ee 
(5’) Q(n, 6, €) = Se(" (e+ 1 —<2) ¢ . 1 = <2) ; 


This formula is a direct generalization of the formula in [1] for P,(¢) where 
P,(¢e) = P(n, « €). 

Another interesting special case is 6 = 0. Here it should be made definite 
that we are considering only intersections of y = F,(x) and barriers (0, e) 
occurring at points other than (0, 0). A special derivation for Q(n, 0, €) is given 
as follows. Considering once more the movement of y = F,,(x) downward from 
(1, 1) in the general case of barrier (6, €) we see that f; is the probability that 








GENERALIZED Di, STATISTICS 595 


y = F,(x) passes through (m;, 1 — j/n) times the conditional probability 
that it did not cross the barrier between (1, 1) and (m;, 1 — j/n), ice. 


(6) f; = b(j,n, 1 — m;)P (, 0, ae re ) 
1 — m; 


from which using (4) we find, 


rlarta)=a()a-mrar[()a-mar 


eae 

l1— m; 
whence by reparametrizing we may deduce that 
(7) P(n, 0, e) = «€ 


independently of n. Finally from (5)’ and (7) we have 


n—l . : 3-1 
ante ote 8 MEG) 
(-f-b- 
n n 


the algebraic identity here having been circuitously derived. 
It is worth remarking that if G(x) is any continuous c.d.f. and G,(z) the sam- 
ple c.d.f. for a sample of n from G(x) we have by transformation from 


(8) 


P(n, 6, €) = Pr (Fa(2) =6+ 1-5 for 0<2z< ) that 


Aenean, 


(9) 


P(n, 6, €) = Pr (G.(2) <é6+ — G(x) for -o <2< 2) 
= = 


P(n, 6, €) = Pr ({1 — dG@,(x) — [1 — G(x) Ss afl — 


for —o <2 < +o) 


for 0 S 6 < 1 and O < e < 1. Therefore given any real numbers a, } and c 
we can express Pr(aG,(z) + 0G(z) S c for —“x% <x < ~) as 0 or 1 or 
P(n, 6, €) for correctly chosen 6 and ¢ depending only on a, b and c. In par- 
ticular 


. 
mes 
(11) Pr(G,(2) S$ aG(zx) for —e <2z<o@)= } a 


\0 if asl 


if a2 


3. The statistics D>), U* and i*. Suppose we consider the class of barriers 
(d, d) moving away from line y = z as d moves from 0 to 1. The d correspond- 





596 A. P. DEMPSTER 


ing to the furthest barrier reached defines random variable Df and the point 
(U*, i*/n) where this furthest barrier is touched defines random variables 
U* and 7*. Since 


“= 
(12) Ut =~ — pt 
n 


the joint distribution of any 2 of the 3 random variables determines the joint 
distribution of all 3. In [1] the distribution of DZ is presented: Pr(D} < d) = 
P,,(d). In [2] the joint and marginal distributions of U* and 7* are derived. 

Now it is possible to generalize this situation by defining a class of linear 
barriers moving away from the line y = z restricted only by the requirement 
that no 2 members of the class intersect within the unit square. These barriers 
may be indexed by real variable d with at most one barrier corresponding to a 
given d, and described as barriers (6(d), e(d)) where 6(d) and e(d) are mono- 
tone non-decreasing. We can allow d to take values in a discrete or continuous 
set. Random variables D} , U* and 7* may be defined as in the previous para- 
graph where relation (12) becomes 


‘* 1 — e(Dz) 
13) UF ant a we OD) 1 eee 
: 2 : | aap 
Clearly now Pr(D; < d) = P(n, 6(d), e(d)) and this gives the distribution of 
the generalized DZ. A method of writing down the joint distribution of U* 
and 7* which applies in general will be demonstrated in the following section 
applied to a special case where the formulas become fairly simple. 


4. The class of barriers (0, d). The class of barriers (0, d) formed by rotating 
a line through the origin may be of some statistical interest. It has been seen 
in this case that 


(14) Pr(Dz s d) = d. 


It. is proposed now to investigate the joint distribution of U* and 7*. Define 
H(i,u) = Pr(i* = cand U* Ss u) andh(i, u) = dH (i, u) / du, so that h(i, u) 
is the density of the joint distribution along the line y = i/n. It is evident that 
h(i,u) = P,P2P;, where P, is the density function at u of the 7th order statistic, 
P, is the conditional probability given that y = F(x) passes through (u, i/n) 
that it does not touch the barrier through (u, i/n) above level i/n, and P; 
is as P, but replacing above by below. Thus 


. n! 


P, — (1 —«)*~ 


~ G—Din—d! 


P,= P(n-i, ey . te 


\ l—wu l1— wu 





and 


P; = P(i — 1,0,1/i) = 1/i 





REPRESENTATION OF THE WISHART MATRIX 


h(i, u) = (")ea —u)" "(1 — (n/i)u) 


for 0 S u S t/n which integrates to give 
(16) H(i,u) = : (”) u'(1 — u)”™™. 


The marginal distributions are given by 


(17) pi = Pr(i* =i) =A (, ‘) = +(?\(¢) (1 ~ ‘) 
n a\i/\n n 


and 


K(u) = Pr(U* S$ u) = > H(i, min |é.u]) 


i=l 


E1( Neel) mal) 


The algebraic identity implied by the relation Dries pi = 1, like that in (8), 
has been indirectly derived. Both identities may be algebraically proved using 
the formula quoted as (5) in [2]. 

We repeat that the method used here to obtain H(i, u) is applicable to the 
general class of barriers (4(d), e(d)). 


(18) 


REFERENCES 

[1] Z. W. Brnnpaum anp F. H. Tincey, ‘“‘One-sided confidence contours for probability dis- 
tribution functions,’’ Ann. Math. Stat., Vol. 22 (1951), pp. 592-596. 

[2] Z. W. Brrnsaum AND R. Pyke, “On some distributions related to the statistic Dt,” 
Ann. Math. Stat., Vol. 29 (1958), pp. 179-187. 

[3] D. G. Cuapman, ‘‘A Comparative study of several one-sided goodness-of-fit tests,”’ 
Ann. Math. Stat., Vol. 29 (1958), pp. 655-674. 

[4] R. Pyxe, ‘‘The supremum and infimum of the Poisson process,’’ Technical Report No. 
39, Applied Mathematics and Statistics Laboratory, Stanford University, March 
1958 (Abstract in Ann. Math. Stat., Vol. 29 (1958), p. 327). 


Re 


APPLICATIONS OF A CERTAIN REPRESENTATION OF THE 
WISHART MATRIX 


By Rosert A. WIJSMAN 


University of Illinois 


0. Summary. Apart from pre- and post-multiplication by a fixed matrix and 
its transpose, the Wishart matrix A can be written as the product of a triangular 
matrix and its transpose, whose elements are independent normal and chi 
variables. Various applications of this representation are indicated. Examples 


Received July 17, 1958. 





598 ROBERT A. WIJSMAN 


are given concerning the diagonal elements of A’, the sample ordinary and 
multiple correlation coefficient, the characteristic roots of A and the sphericity 
criterion in the bivariate case. 


1. Introduction. Let X be a p X n matrix, with p S n, whose columns are 
independent and distributed according to a p-variate normal law with mean 
vector 0, and covariance matrix 2. The matrix A = XX’ will be called the Wish- 


art matrix. If & = I, (the p X p identity matrix), then A can be written in the 
form 


(1) A = TT’ 


in which T is a triangular p X p matrix whose elements are independent random 
variables, the off-diagonal elements being N(0, 1), and the diagonal elements 
being x variables with certain degrees of freedom.’ More specifically, if we choose 
T to be lower diagonal (which we shall do from now on), then 7';; is a x vari- 
able with n — i + 1 degrees of freedom. The representation (1) is known; in 
fact, it is implied by the Bartlett decomposition [2]. However, only few authors, 
like Mauldon [6], state the representation (1) and the nature of T explicitly.” 
Equation (1) is also implied in [8]. It is the purpose of this note to point out 
some of the applications of (1), or, as the case may be, of the more general 
equation (2) below. 
If = does not necessarily equal I, , let C be a p X p matrix such that 


CC’ = &. 
Then the Wishart matrix can be represented as 
(2) A = CTT’C’ 


with T as before. It may be convenient in applications to choose C also lower 
triangular. If & = I, , we may take C = I,, so that (2) reduces to (1). 
Equation (2) can be used in several ways. In the first place the Wishart 
distribution can be derived very easily starting from the distribution of T, 
since the Jacobian of the transformation (2) from T to A is simple to com- 
pute. Secondly, if it is desired to generate values of A, or of a function of A, 
by a random process (for an application see [7]), this can be done conveniently 
by generating values of T. In the third place, the distribution and certain prop- 
erties of functions of A can sometimes be obtained quite easily by expressing 
them as functions of the elements of T. It is of this third kind of application 
that we will give some examples. Concerning notation and nomenclature, the 
ratio of a normal and a central x variable will be called a ¢’ variable, and the 
ratio of a x’ variable and a central x’ variable will be called an F’ variable (the 
primes are used here to distinguish these variables from the customary ¢ and 
F variables, in which the x’ variables have been divided by their degrees of 
1 In [8], footnote 3, the diagonal elements were erroneously termed x? variables. 
2 Very recently, A. M. Kshirsagar (Ann. Math. Stat., Vol. 30 (1959), pp. 239-241) also 
gave the decomposition (1) and a simple derivation. 





REPRESENTATION OF THE WISHART MATRIX 599 


freedom). The degrees of freedom of a ¢’ or F’ variable will be indicated by 
subscripts on ¢’ or F’. 


2. Applications. 


EXAMPLE 1: THE DIAGONAL ELEMENTS OF A IF © = I,. These diagonal 
elements are obviously identically distributed, so it suffices to consider (A~*) »» . 
By (1) we have A“ = T’'T™ (with probability 1, T is non-singular). Now 
T™ is also lower triangular, and its diagonal elements are the reciprocals of the 
corresponding diagonal elements of T. Thus, we find 


(a hee = i kee te - ig ie 


Hence, each diagonal element of A™ is the reciprocal of a x%,-p4: variable. This 
result can be applied to exhibit Hotelling’s 7° as a constant times an F variable 
[8]. 


EXAMPLE 2: THE SAMPLE CORRELATION COEFFICIENT. This is essentially a 
bivariate problem, so we may set p = 2. Let the population correlation coeffi- 
cient be p, the sample correlation coefficient r = Ay(AynAn)”. It suffices to 
assume 2 = Ye2 = 1, Zw = Yn = p. This can most conveniently be effected 
by choosing C lower triangular, with Cy = 1, Ca = p, Cex = (1 — p)' * From 
(2) we compute then 


c Tat Tup/ Vi=e 
: VYi—r T's 


The same expression was also obtained by Elfving [3], following a different 
method. The right hand side of (3) can be described as a non-central ¢',_; vari- 
able, with a random non-centrality parameter Typ(1 — p')~”, which is a xn 
variable times p(1 — p')~"*. From this remark an expression for the density p(p, - ) 
of r(1 — r’)~” follows at once. Let f(p, -) be the density of T:p(1 — p’)~”, and 


g(é, -) the density of a non-central ¢,_, variable with non-centrality parameter 
&. Then 
é. 


(4) re I f(p, €)g(&, x) dé. 


From (4) the monotonicity of the probability ratio follows then immediately 
by applying a theorem of Lehmann [5] (theorem 3), or a theorem of Karlin 
[4] (lemma 5). 


EXAMPLE 3: THE SAMPLE MULTIPLE CORRELATION COEFFICIENT. Let R be the 
population multiple correlation coefficient between the pth variate and the 
first p — 1 variates and let R be the corresponding sample quantity. Then 
(5) 1 — R° = |A| / |A*| A,> 


where A* is obtained from A by deleting the last row and column. It is suffi- 
cient to choose C to be lower triangular, with Cy = --- = Cpu, pa = 1,Cp = 





600 ROBERT A. WIJSMAN 


(1 — R’)'?, Cn = R, all other elements equal to 0. Substituting into (2) and 
using (5), we derive 





= . gat 
(6) R (Tat+TuR/Vi—-R 2)? 4 dX Ts: 
eae eo ee 


This can be described as a non-central F’,-;,,-p4: variable, with a random non- 
centrality parameter T},R’/(1 — R’), which is R’/(1 — R’) times a x}, vari- 
able. An expression for the density of R’/(1 — R’) can then be written down 
at once, and the monotonicity of the probability ratio follows in a similar way 
as in Example 2. 


EXAMPLE 4: THE CHARACTERISTIC ROOTS OF A AND THE SPHERICITY CRITERION 
IN THE CASE p = 2 AND & = I,. Let the square roots of the characteristic roots 
of A be A; and dz (A; 2 Az). We have 


(7) MN+A=trA 
(8) AAR = [Al 


The joint distribution of \; and 2 is determined by the joint distribution of 
(A, — As)* and 2\,\2, which turns out to be very simple. Using (7), (8) and 
(1) we compute 


(9) (Ma — As)” = (Tu — Tx)? + Th 
(10) 2riA2 = 2TuTx ° 


By the lemma below, the right hand sides of (9) and (10) are independent, 
and distributed as x3 and x3n—2 respectively. 

Lemma. If X and Y are independent and distributed as xn, Xn—-1 , respectively, 
then (X — Y)? and 2XY are independent and distributed as x} and x3n-2 respec- 
tively. 

The proof is straightforward, and will be omitted. 

The sphericity criterion (Anderson [1], section 10.7) in the bivariate case is 
the ratio Z = 2|A|"?/ tr A, which can also be written as 


Z = 2ade/ (Ai + 22). 
Using (9) and (10) we find 
(11) a cs 
1-Z (Tu — Tx)? + Th 
which is an F3,-22 variable, by the lemma. 

The statistic Z, or, equivalently, Z / (1 — Z), can be used to test the hypothe- 
sis H that £ = o'l.(o’ unknown) against the alternative that this is not so. 
The likelihood ratio test rejects H if Z < constant. It can be shown that this 
test is also uniformly most powerful invariant. In the first place, Z is maximal 
invariant. Secondly, the distribution of Z depends on a single parameter only, 





DVORETZKY’S STOCHASTIC APPROXIMATION THEOREM 601 


e.g. on 2| = |"” / tr X. In the third place, it can be shown that the probability 
ratio is monotonic. This can be demonstrated either by starting from 
the Wishart distribution, or by using (2). However, in this example the latter 
way does not seem to be any simpler than the former. The moral seems to be 
that in some cases the utilization of the representation (1) or (2) leads to the 
results in a fast and elegant way, but in other cases the conventional approach 
may be more practical. 


REFERENCES 


[1] T. W. AnpeRson, An Introduction to Multivariate Statistical Analysis, John Wiley and 
Sons, New York, 1958. 

[2] M. 8. Bartizrr, ‘On the theory of statistical regression,’’ Proc. Roy. Soc. Edinburgh, 
Vol. 53 (1933), pp. 260-283. 

[3] G. Exrvina, ‘‘A simple method of deducing certain distributions connected with multi- 
variate sampling,”’ Skand. Aktuarietids., Vol. 30 (1947), pp. 56-74. 

[4] SamugEL Karin, ‘‘Decision theory for Pélya type distributions. Case of two action, I.’’ 
Proceedings of the Third Berkeley Symposium on Mathematical Statistics and 
Probability, Vol. 1, University of California Press, 1956, pp. 115-128. 

(5) E. L. Lewmann, ‘Ordered families of distributions,’’ Ann. Math. Stat., Vol. 26 (1955), 
pp. 399-419. 

[6] J. G. Mauupon, “Pivotal quantities for Wishart’s and related distributions, and a para- 
dox in fiducial theory,’”’ J. Roy. Stat. Soc., Ser. B, Vol. 17 (1955), pp. 79-85. 

[7] Dante. TeIcHRoEW AND RosepitH SiteRreaves, ‘‘Computation of an empirical sam- 
pling distribution for the classification statistic W,’”’ Probability and Statistics 
in Item Analysis and Classification Problems, School of Aviation Medicine, USAF, 
Texas, No. 57-98 (1957). 

[8] Ropert A. Wissman, ‘‘Random orthogonal transformations and their use in some classi- 
cal distribution problems in multivariate analysis,’”’ Ann. Math. Stat., Vol. 28 
(1957), pp. 415-423. 


a 
ON DVORETZKY’S STOCHASTIC APPROXIMATION THEOREM 


C. DeRMAN! AND J. Sacks? 


Columbia University 


1. Introduction. A very general theorem was proved by Dvoretzky [2] on 
the convergence of transformations with superimposed random errors. This 
work followed that of Robbins-Monro [5] and others (see [6] for bibliography) 
and contains the most comprehensive results on convergence (with probability 
one and in mean square) of the stochastic approximation procedures of Robbins- 


Received August 21, 1958. 

1 This research was sponsored by the Office of Naval Research under Contract Nonr- 
266(55), Project NR 042-099. 

2 This research was sponsored by the Office of Naval Research under Contract Nonr- 
266(51), Project NR 043-198. 


Reproduction in whole or in part is permitted for any purpose of the United States 
Government. 





602 C. DERMAN AND J. SACKS 


Monro [5] and Kiefer-Wolfowitz [3]. Wolfowitz [7] provided another proof of 
Dvoretzky’s theorem. In this note we provide a third proof of the probability 
one version which is of a simpler nature than the previous two. The method of 
proof also permits a direct extension to the multidimensional case. The multi- 
dimensional results obtained by Block [1] do not seem to include the result below. 
Mean square convergence does not seem to follow readily from our methods. 


2. Fundamental lemmas. Here we prove two lemmas which are at the basis 
of our method of argument. The first lemma is used to prove Theorem 1 below— 
the one-dimensional version of Dvoretzky’s theorem. Lemma 2 is used in the 
proof of Theorem 2—the multidimensional version of Theorem 1. 

Lemma 1. Let {a,}, {bn}, {en}, {dn}, and {£} be sequences of real numbers satis- 
fying 

(i) fan}, {bn}, fen}, {En} are non-negative, 
(ii) lim... a, = 0, > << 2, 7 C, = ©, Zz 5, converges, 
and, for all n larger than sowie No, 

(iii ) En41 Ss max (Gn, (1 + ba)En + bn sad Cn). 

Then, lim, &, = 0. 
Proor. Let N > N» and write 


(1) B, = [] (1 + 8). 
t=] 
Take n > N and iterate (iii) back to N. This yields 


fny1 S max (F ty + B, >, a % ; 


By j=N J 
B ~ 8; — C; 
ax |B a, + B, bl). 
Pn ? ” + oes B; 


Now (i) and (ii) imply that B, increases to B (say) which is finite. It can 
then be shown that >°71 4;/B; < ~ and }°71¢;/B; = ~. Since (B,/Bw-1)tw 
is finite we see that the first term in the right-hand side of (2) must be negative 
for large enough n and can therefore be ignored. Thus, for n large enough 


(2) 


En4. S max (?: a+B, > Td “) 
Nsksn B, juk+1 i 
(3) . 
<B (max a + max i 6;/B; ) : 
k=N Nsksn | jmk+1 


Since >> 5;/B,; converges and a, — 0 the right member of (3) can be made 
arbitrarily small by choosing N large enough. This completes the proof of Lemma 
i: 

Lemma 2. Let {an}, {en}, {En} be as in Lemma 1. Suppose 

(i) {6,} are positive, >> 5, < ~, 

(ii) >> b, converges and >. b:, < &. 
Then, limp. &, = 0. 





DVORETZKY’S STOCHASTIC APPROXIMATION THEOREM 603 


Proor: Let N be large enough so that |b,| < 1 for n > N. Since >°b, con- 
verges we have, for n, k > N, 


0 < BX II (1 + b;) < exp| > | 
B, i=k+1 Li=k+1 


(4) n | 
 exp| max max yb ||sa<e. 


n2N Neksn | i—k+1 
Also, because of (ii) and the fact that 
B,, n n 
= I a-) = I a-%, 
N i=N+1 i=—_N+1 


we have 


. ae 


With (4) and (5) established the proof goes through as in Lemma 1. 

We remark that, if the sequences {a,}, --- , {g,} are random variables which 
satisfy the stated conditions with probability one, then the results of the lemmas 
hold with probability 1. 


3. Stochastic approximation theorems. 
TuEoreM 1. (Dvoretzky). Let {X,}, {T7,(Xi, --- , Xn)}, {¥n( Xi, -++ , Xn)} 
be sequences of real random variables with X, arbitrary and 


(6) Xazi = Ta(X1, --* , Xn) + Vu(Xi, -+* , Xn). 
Assume 

(7) E{Y,| Xi, ---,X.} = 0 w.p.1, 

(8) DL EY, < &, 

and 

(9) |T'n| S max (an, (1 + Ba)|Xa| — Yn) 
where an, Bn, Yn are positive numbers such that 

(10) n—0, > Br< oe, DLm= @. 


Then X, — 0 w.p.1. 
Proor. We may assume that {a,} is such that 


EY’. 


n 


(11) < @ 


For, if this is not the case, there is always a sequence as which satisfies (10) and 
(11). Taking A, = max (a, , a.) we obtain a sequence which satisfies (9), (10), 
and (11). 

Define Z, = Y, sgn T,. Then (7), (8), and (11) hold with Y, replaced by 





604 C. DERMAN AND J. SACKS 

Z, . Now (7) and (8) imply that > Z, converges w.p.l. From (11), the Cheby- 
shev inequality, and the Borel-Cantelli lemma we conclude that 

(12) |Zn| S On 


w.p.1 for n large enough. Now, from (6) and (12) we can write w.p.1 that for n 
large enough 


|Xn4sl S 2 An ? if \T'al 
IT,| + Zn; if |T,| > an. 


WA 
§ 


Hence 
[Xnsi| S max (2a,, |Tn| + Zn) S max (2an, (1 + Bn)|Xa| + Za — Yn) 


for large enough n w.p.1. Lemma 1 with & = |X,|, dn = 2an, bn = Bn, on = Zn, 
Cn = Yn yields the desired conclusion. 

Remark: The above proof also goes through for the extended case considered 
by Dvoretzky where the a’s, 6’s, y’s are allowed to be random with a, — 0 
uniformly w.p.1 and 6, , y, satisfying (9) and (10) w.p.1. 

We now turn to the obvious multidimensional generalization of Theorem 1. 
The symbol |¢| will be used below to denote the length of a vector ¢. 

THEOREM 2. The conditions are the same as in Theorem 1 with these modifica- 
tions: {Xn}, {Tn}, {Yn} are p-dimensional random vectors; (8) should be interpreted 
as > E\Y,|" < ~; the absolute value in (9) should be read as length. The con- 
clusion is that |X,| — 0 w.p.1. 

Proor: As in Theorem 1 we can assume 


(13) Trot «<e. 


n 





For similar reasons we may also assume 


(14) > ann = %. 


If we define, for random orthogonal transformations P, = P,.x,,x,,---.x,;4n = 
P,Y,, then {Z,} satisfies (7), (8), and (13) and, as a consequence of (13), 


(15) \Zn| < an 
w.p.1 for n large enough. Choose P,, so that P,7T,, = (|T,|, 0, --- , 0) and notice 
that 
9 9 2 
(16) [Xnsil” = [Pa Xazil” = (Tal + Zu)? + Do Zar 


r=2 
where Z,, is the rth component of Z, . 
Fix w (a point in the sample space) and choose N(w) so that, forn = N(w) 
(15) holds. If |7',| > 2a, we have as consequences of (9) and (15) 


(17) 0< 2an + Zant < |7’,.| + Zni < (1 + Bn) 





X,| a Ta + Zu ’ 





DVORETZKY’S STOCHASTIC APPROXIMATION THEOREM 


Tt 2an + a 
(17 ) |X| > Tie = Pn (say). 


Thus (16) and (17) yield 
\Xnsil’ S ((1 + Bn)|Xnl — yn + Zm)* + 2 Zar 


(1 + Bn)*|Xal? + |Za)? + 2Za(1 + Ba) |Xa 
a 2yn Zni nad 272(1 + B,) |X| + vn . 


Let —c, be the sum of the last three terms. Then (15) and (17’) show that 
—Cn < —2anyn and, therefore, from (14), }>c, = «©. Thus, whenever 
IT.| > Zan, 


(18) — |Xnaal? S (1 + Ba)® |Xal® + “ot 8) IX. + 2 —«. 


If |T,| S 2a,, then |X,4:|" S 9a%, . Thus, letting 


Lt be = (1+ By)? + 22 + Ba) | 


we have 
(op [Xnal? S max (9a’, , (1 + ba)|Xal* + |Zul” — en). 


We wish to apply Lemma 2 at this point and to do so we put a, = 9a, 5, = |Z,\’, 

t, = |X,|" and c,,b, as they are defined between (17’) and (19). Since 

>> Zi, < © w.p.1 isa consequence of (8) we have only to verify that >> b, and 

> bi, converge. Since es 8, < © and, consequently, a B. < @ we have to 

show that >> Zn(1 + Bn) converges. Since E{Z,; | Xi, ---, Xa} = 0 w.p.l 
Pn 


and since, by (13), 


“(1+ Br) Zn - ¥ o Bis 
2 * p>, mm 2, ( + Bn) (Zan + n)* 


Cy 2 
< max (1 + ayy =e < © 


we can apply Theorem D, p. 387, [4] to draw the desired conclusion. In similar 
fashion we can show >. bi, < ~ and this completes the proof of the theorem. 

Remark: Theorem 2 and its extension permitting random a,, 8,, and 7. 
enable us to prove convergence properties of the multidimensional analogues 
of the Kiefer-Wolfowitz procedure. In particular it fills a gap in [6], Section 5 
where convergence w.p.1 is assumed. The author of [6] expresses his thanks to 
H. Kesten for bringing this to his attention. 


REFERENCES 


[1] H. D. Buock, ‘‘On stochastic approximation,’’ ONR Report, Department of Mathe- 
matics, Cornell University, Ithaca, New York. 





606 J. R. ISBELL 


{2} Anven Dvorerzxy, ‘‘On stochastic approximation,’’ Proceedings of the Third Berkeley 
Symposium on Mathematical Statistics and Probability, Vol. 1, pp. 39-56, Uni- 
versity of California Press, 1956. 

(3) J. Krerer and J. Woirowitz, ‘‘Stochastic estimation of the maximum of a regression 
function,’’ Ann. Math. Stat., Vol. 23 (1952), pp. 462-466. 

[4] Micue. Loxnve, Probability Theory, D. Van Nostrand Company, New York, 1955. 

[5] Hersert Rossins and Sutrron Monro, “‘A stochastic approximation method,”’ Ann. 
Math. Stat., Vol. 22 (1951), pp. 400-407. 

(6] Jerome Sacks, “Asymptotic Distribution of Stochastic Approximation Procedures,”’ 
Ann. Math. Stat., Vol. 29 (1958), pp. 373-405. 

[7] J. Wotrowrrz, “On Stochastic Approximation Methods,’’ Ann. Math. Stat., Vol. 27 
(1956), pp. 1151-1155. 


rc 


ON A PROBLEM OF ROBBINS 


By J. R. IsBewu 
University of Washington 


1. Introduction. This note concerns a sequential decision problem raised by 
Herbert Robbins [2]. The problem is not solved; in fact, it is not known if there 
is a uniformly best procedure. A procedure is given here which is uniformly 
better than the one proposed in [2] and is best at least in a special case. 

The nature of the problem is this: given two coins with unknown probabilities 
Pi, Pz, of coming up heads, to prescribe a rule for making an infinite sequence 
of tosses, choosing the coin for the nth toss as a function of the history of the 
sequence since the (n — r)-th toss (inclusive). The memory length r is fixed. 
The aim is to maximize the frequency of heads. 

The rule proposed here is best in case p; or pz is 0. We cannot say the best, 
since many rules have the same effects in this case. The rule may be briefly 
stated: “Change coins when one coin shows tails r successive times, or when r — 1 
tails with one coin are followed by a single toss with the other coin, which is tails’’. 
Robbins’ rule [2] calls for changing in these cases and further whenever the 
first toss with a new coin is tails. For r < 2, the rules coincide. Otherwise the 
present rule is better except in two trivial cases, p; = p. and max (p; , po) = 1. 


2. Formulation. The description of the memory requires some amplification 
for the case n < r. (None is given in [2].) Here we shall regard the sequence of 
tosses as a Markov process with 4” states, namely the states of the memory. 
We consider that the process may begin in any state, and we propose to evaluate 
any procedure according to the results it yields starting from the worst possible 
state. 

This is an artificial description which one might prefer to avoid. On the other 
hand, any decision procedure which might be optimal according to some other 
version of the problem but disqualified by our artificial start could be described 


Received July 8, 1958; revised January 5, 1959. 





ON A PROBLEM OF ROBBINS 607 


as unstable; an experimenter using it could not afford to make finitely many 
errors in recording his results. This matter will be illustrated with an example 
at the end of the paper. 

Given a rule R, an initial state 7, and probabilities p, and p, of obtaining 
heads with each coin, one has a definite stationary Markov process. Then for 
each of the 4’ states s, the frequency of state s converges with probability 1 
to a definite limit. The probability of obtaining heads on a toss in state s is 
either p; or p2 (depending on s and R). The tosses are independent events, and 
therefore the frequency of heads converges with probability 1 to a definite limit 
f(R, t, Pi, Pz). 

We propose to measure the worth of a rule R by the function w(R, p: , po) 
defined as the smaller of min; f(R, 7, p: , p2) and min; f(R, i, pe , p:). Presumably 
the permutation of p, and p, here has the same effect as Robbins’ requirement 
[2] that f be symmetric in p, and pe . (This is not an exact statement of Robbins’ 
requirement, and cannot be exact, because of the vagueness concerning the 
start which we have already noted.) This still does not give us a numerical 
value for each R, nor does a numerical value seem appropriate. One may hope 
that for each memory length r there is a rule with maximum worth w for all 
Pi, P2- 

The rule R? which is the subject of this note prescribes tossing the same coin 
as on the toss immediately preceding, in all but 4 of the 4’ states; one changes 
coins when the memory records r tails with the same coin, or r — 1 tails with 
one coin followed by a tail with the other coin. 

The worth w(R?, p: , p2) is readily computed. For r = 1 and r = 2 the rule is 


identical with Robbins’ R, , and the worth is the value given in [2]: 


Pi 92 + Pe i 
Gi + 

where g; = 1 — p; for j = 1, 2. Here Robbins can use an argument on blocks of 
consecutive tosses with the same coin (untroubled by the vagueness we have 
noted ), because these blocks are independent events. For r > 2, using R?, those 
blocks of tosses are not independent. We have established the existence of 
f(R*, i, p: , p2) by considering 4’ states of the process. For computing its value 
it is convenient to consider four special states S; , L; (¢ = 1, 2) called marked 
states. S; is the state in which R? prescribes that we terminate a short sequence 
(one toss long) of tosses of coin i, L; the corresponding long sequence-ending 
state. Except in trivial cases, with probability 1, each marked state is succeeded 
by 0 or more unmarked states and then a next marked state. The subsequence 
consisting of the marked states is again a stationary Markov process. Let o; , 
d; , be the respective relative frequencies of the marked states. We have o; + \; = 
4 = o2 + Ae, o1 = Gide, and o = gr. The proportion o1:);: 02:2 is then 
1 D2: P1:Q2 Pi: P2. Now the occurrences of S; are followed by blocks of tosses 
with coin 2 which terminate precisely when tails first comes up r times; their 
expected length [2] is (1 — q2) / p2q2. The blocks following LZ; are governed 


(r = 1,2) 





608 J. R. ISBELL 


exactly by Robbins’ R,, and have expected’ length 1/q: [2]. We find w to be 


Pi q3(1 — gigs) + pe gi(l — m gs) 
q3(1 — qige) + gi(1 — a 43) 
for r > 2. 


3. The results. The rule R* may, for anything I know to the contrary, be 
uniformly best for each r. What will be proved here is this: 

The worth w(R?, pi , p2) is greater than or equal to w(R, , pi , P2) for all values of 
r, Pi, and po , with equality in the three cases r S 2, p: = po, max (pi, po) =1. 
It is greater than or equal to w(R, pi , p2) for any rule R using the same memory 
length r, in at least two cases: the case r = 1, and the case min (pi, pz) = 0. 

The assertions comparing R? with R, follow from the previous computation. 
The assertion concerning r = 1 follows from simple computations which the 
reader may supply. (There are 4 states and 16 rules in this case.) 

Consider next the case that p; = 0 and r 2 3. We may assume 7p; is not 0 
(all rules would have worth 0) nor 1 (R? would have worth 1). Let R be any 
rule using memory length r. We shall consider subcases, assuming first (a) when 
the memory records r successive tails with coin 1, R prescribes the use of coin 1 
again. But if this is the initial state, then with probability 1 the process consists 
entirely of tosses of coin 1 and entirely of tails. Thus w(R, 0, p2) = 
0 < w(R?, 0, p2). Now suppose (b) when the memory records r successive tails 
with coin 2, R prescribes the use of coin 2 again. The argument under (a) shows 
that w(R, p2 , 0) = 0, and this is the same as w(R, 0, pe) from the definition of 
w. There remains subcase (c): neither (a) nor (b). Then with probability 1 
each coin is used infinitely often. As in [2], define x, for k = 1, 2,--- , as the 
length of the kth block of consecutive tosses of coin 1, and y, as the length of 
the kth block of consecutive tosses of coin 2. Every 2; is at least 1. The expected 
value of y given %1,-°*, 2k, Yr» ***» Year, i8 at most A = (1 — qs) / p2qQ3; 
for this is the expected length of a sequence of tosses terminated precisely at the 
first run of r consecutive tails, and the present sequence must terminate at 
or before that run. Then the expected frequency of heads in the first 2k blocks 
is at most Ap. / (1 + A). We already know that the frequency of heads in the 
first n tosses converges with probability 1 to f(R, 7, 0, pz), and therefore 
f(R, i, 0, pe) S Apo / (1 + A) = w(RF, 0, pe). With a glance at the definition of 
w, the proof for this case is complete. 

By the symmetry of w, the case pp = 0, r 2 3, is also accounted for. We need 
only complete the argument for the case p,; = 0 and r = 2. As above, we may 
assume p: is not 0 nor 1; moreover, the subcases (a) and (b) are settled by the 
same reasoning as above. 

We are now considering a rule R using memory length 2 which prescribes 
changing coins whenever the memory shows two tails with the same coin. Let 
us subdivide this case according to what RF prescribes in case the memory records 
tails with coin 1 followed by tails with coin 2. (i) Suppose R prescribes changing 





ON A PROBLEM OF ROBBINS 609 


back to coin 1 in this state. Observe that with probability 1, coin 1 comes up 
tails every time it is used. Then the argument under subcase (c) of the previous 
case (r 2 3, p: = 0) can be modified; every block of tosses with coin 1 is at 
least one toss long, and the expected length of any block of tosses with coin 2 
is at most 1/g:. This leads to the conclusion w(R, 0, pz) < w(Rf, 0, pe) for 
subcase (i). 

We are left with subcase (ii): R prescribes using coin 2 when the memory 
records ‘‘coin 1, tails; coin 2, tails’. I do not know an argument for this case 
which would avoid the computation of six estimates for w(R, pz, 0). The case 
hypothesis guarantees that blocks of tosses with coin 2 are at least two tosses 
long, unless the preceding block of tosses with coin 1 ended with heads. There 
are three memory states which may have non-zero frequency and which end 
“coin 1, heads’ (only three since coin 2 here never shows heads). Then there are 
2° possibilities as to what R prescribes in these states; but the eight reduce to 
six because some rules exclude some memory states. 

Five of the six subcases seem absurd—stopping with heads. All I can say is 
that routine computations suffice to dispose of them. In the remaining case 
every block of tosses with coin 2 has length exactly 2; a block of tosses with 
coin 1 lasts at most until tails shows twice, the expected length is at most 
(1 — g:) /p2q2, and 


pa(l — gs) p(l — 43) _ | a 
w(R,p,0) s i- @a— 2ps) Ss ~i o eps = w(R:z, p: , 0). 

4. Concluding remarks. In my review [1] of Robbins’ paper [2], I stated that 
Robbins’ rule “appears to be best for r = 2’. This remark was based on compu- 
tation of the effects of the rules which treat the coins symmetrically and never 
change coins when the last toss was heads; there are only eight of these. It would 
certainly be surprising if R. (which is RJ) were not uniformly best; but a proof 
is still lacking. 

Finally, consider the following alternative formulation of the problem, which 
is consistent with the incomplete description in [2]. For the nth toss,n 2 r + 1, 
the situation is as we have described it; but for the first r tosses the experimenter 
may use a special rule treating this part of the process as a collection of transient 
states. With this formulation, for r = 4, there is no uniformly best rule. This may be 
established by checking that there is no single rule which both (a) does as well 
for p, = 0, p: = 3, as R*, and (b) does as well for p, = 0, p2 = }, as a certain 
rule S, described herewith. The effect of S, will be (with probability 1) to set up 
an alternation of blocks of tosses with coin 1, one toss long, and blocks of tosses 
with coin 2 which end only when heads comes up r successive times. When p2 
is less than 4, these are longer than the corresponding blocks which R? gives 
and which end when tails comes up r successive times. An argument similar to 
those we have been using shows that this is the best one could possibly hope for. 
To see that it is possible, examine the rule S,, which calls for using the coin 
used last in all but eight memory states: (1-2) four successive heads with either 





610 J. R. ISBELL 


coin; (3-4) three heads with one coin, tails with the other; (5-6) tails twice with 
one coin, then twice with the other; (7-8) the transient states succeeding the 
first two tosses, in case the same coin was used and came up tails both times. 

One can easily modify S, to obtain an S, , as described above, for r = 5,6, --- . 
A little study of S, suggests objections to its use even if it is permitted; its worth 
is a discontinuous function of p; and pz , less than the worth of R* for almost all 
values. There is also the objection mentioned earlier; the limiting frequency of 
heads can be changed by finitely many errors. 


REFERENCES 


[1] J. Ispeux, Review of [2], Math. Reviews Vol. 18 (1957), p. 606. 
{2} H. Rossrns, ‘‘A sequential decision problem with a finite memory,’”’ Proc. Nat. Acad. 
Sci., Vol. 42 (1956), pp. 920-923. 


OO 


ACKNOWLEDGMENT OF PRIORITY 
By J. N. K. Rao 


Iowa State College 


I am grateful to I. M. Chakravarti of The Indian Statistical Institute, Cal- 
cutta, for kindly bringing to my attention that Theorem I in my note “A char- 
acterization of the normal distribution” (Ann. Math. Stat., Vol. 29(1958), p. 
914), had been derived under a less stringent condition by R. G. Laha in two 
notes, “On an extension of Geary’s Theorem” (Biometrika, Vol. 40(1953), p. 
228) and “On a characterization of the multi-variate normal distribution” 
(Sankhya, Vol. 14(1954), p. 367). I wish to acknowledge the priority of Dr. 
Laha’s results, which were overlooked by me and Seymour Geisser (My note is 
a follow up of Geisser’s “A note on the normal distribution” (Ann. Math. Stat., 
Vol. 27 (1956), p. 858). 


I 


ADDENDUM 


By CuHarues E. Ciark anp G. Trevor WILLIAMS 


Booz, Allen and Hamilton and The Johns Hopkins University 


The references listed in our “Distributions of the members of an ordered sam- 
ple” (Ann. Math. Stat., Vol. 29 (1958), pp. 862-870) should have included “‘Statis- 
tical treatment of censored data. I Fundamental formulae,” by F. N. David and 
N. L. Johnson (Biometrica, Vol. 41 (1954), pp. 228-240). This earlier paper 
considers the basic problem of our paper, inter alia. Both papers use power 
series expansions of the inverse of the distribution function. Since the analysis 
of the earlier paper leads to expressions in powers of (N + 2) and our paper 
leads to reciprocals of factorials of N + 2, many results of the two papers are 
identical to terms of order N-'; in other words both papers reproduce the classi- 
cal approximations. 





ABSTRACTS OF PAPERS 


(Abstracts of papers presented at the Pittsburgh, Pennsylvania Meeting of the Institute, 
March 19-21, 1959.) 


1. Mathematical Problems Associated with Measurements Made by Match- 
ing with Known Standards. W. S. Connor ann N. C. Severo, National 
Bureau of Standards. 


The process of matching can be thought of as measuring the unknown true value of some 
characteristic of an object, and of comparing the measured value with a series of equally 
spaced standard values. Let the true value of the unknown be 0, the measured value be Y, 
the standard value closest to Y be X, and this closest standard value minus the measured 
value be Z. Then X = Y + Z. Further, let Y be a random variable which is normally dis- 
tributed with mean zero and variance o*, let Z be uniformly distributed on the interval 
(—a, a), and let Y and Z be statistically independent. The probability P4 that an unknown 
object will be assigned the standard value which is closest to its true value is shown to 
be Ps = Pri—a < X S a} = [@(2a/c) — 6(—2a/c)] + o/ale(2a/c) — ¢(O)] where 


(t) = fiw o(G) dt = Jiu (1/V/ dade ae. 


The solution is obtained by convenient use of the usual convolution formula for represent- 
ing the probability density function for the sum of two statistically independent random 
variables. By similar methods, the probability Pg, that two independent measurements on 
the same object will result in the same assigned standard value, regardless of whether it is 
the closest standard value, is shown to be 


Px = (6(\/2a/c) — &(—V/2a/e)| + V20/ale(./2a/c) — ¢(0)). 


2. The First and Second Moment Structure of the Maximum Likelihood Es- 
timators of the Parameters of a Multivariate Normal Distribution with 
Double Samples. Jack Napier, Bell Telephone Laboratories, Whippany, 
New Jersey. 


Let x be a p-dimensional, non-singular, normally distributed, vector random variable 
whose coordinates are partitioned to form a qg-dimensional vector random variable y and 
an r-dimensional vector random variable z(p = q + r;q 2 1;r 2 1). It is assumed that a 
random sample of n observations is taken from the distribution of z and an independent 
random sample of N — n observations is taken from the distribution of y. The first and sec- 
ond moments of the maximum likelihood estimators of the parameters of the distribution of 
z are presented for the combined sample. 


3. Some Properties of Stirling’s Numbers of the Second Kind. Jonn L. Baca, 
Florida State University. 


In the course of investigation of the asymptotic behavior of a probability distribution 
discovered by Leo Katz and James Powell (Proc. Am. Math Soc., Vol 5 (1954), pp. 621-626), 
a relationship was noted between one of Sukhatme’s Bipartitional Functions and Stirling’s 
Numbers of the Second Kind. Jordan (Calculus of Finite Differences, Chelsea Pub. Co., 
New York, 1947) deals with many properties of Stirling’s Numbers of the Second Kind. 
Denote the sum on the superscript of these numbers, for fixed subscript n, by f(n). A gen- 
eral formula for the mth difference of f(n) is proved. 


611 





612 ABSTRACTS 


4. Minimax Solutions to Trichotomies. Lonnie L. Lasman, Florida State Uni- 
versity. 


This paper presents an explicit method of obtaining minimax solutions to a special class 
of problems. Let z be a real-valued random variable with probability density f(z, 6) = 
b(@)e"*g(xz), a continuous function of the parameter @, and consider the problem of accept- 
ing one out of the three hypotheses H;:@ = 6; ,i = 1, 2,3. Given three possible decisions 
to make, a; , a2 , or a; , where a; means accept H; , and a set of losses w;; , the loss incurred 
from taking action a; when H; is the correct hypothesis, then it is shown that if a pair 
(z’, 2”) with x’ < 2” exist such that the risks to the statistician from a choice of a test 
procedure ¢ are the same regardless of the value of 6, the minimax solution to the problem 
is a form of the Bayes solution. Denote such a solution by ¢*. In such a case there is a com- 
mon risk V regardless of which hypothesis is true. It is then shown that there is a least 
favorable a priori distribution £ on @ such that the risk when nature uses ¢* and the statis- 
tician uses ¢ is never less than the risk using &* and ¢*; this latter risk is also shown to be 
V, and thus the minimax solution is established. An example using the normal curve is given. 


5. The Distribution of the Number of Successes in a Sequence of Dependent 
Trials. K. R. Gasrrer, University of North Carolina. 


A sequence of dependent trials is considered which has the properties of a Markov Chain 
with two ergodic states and transition probabilities: failure to success P(1 — d), success 
to success P(1 — d) + d--- P being the stationary probability of success. The exact dis- 
tribution of the number of successes S in n trials is derived, and the first four moments 
obtained exactly. Approximate formulae are suggested as follows: 


K2 = nP, 
K, = nP(1 — P)(1+d)/(1— d), v3 = (1 — 2P) (1 + 4d + d*)/(nP(1 — P))§ (1 +d) (1 — a) 
ys = (1 — 6P + 6P?)(1 + 10d + d*)/nP(1 — P)(1 — d*). 


From the theory of recurrent events it is shown that S is asymptotically normally distrib- 
uted with mean and variance as above. Numerical computations of exact and approximate 
cumulants for selected values of n, P and d are presented. These give an idea of the char- 
acteristics of the distribution and the rapidity of its approach to normality. The distribu- 
tion is illustrated by an application to the number of rainy days in a month, with n = 31, 
P = 473 andd = .381, and the normal approximation is seen to be very good. 


6. Values of Games with Moves in {0, 1]. (Preliminary report) Martin Fox, 
University of California, Berkeley. 


Consider a zero sum, two person game, in which the players move alternately by choosing 
points in [0, 1]. The game will end when a total of n moves have been made. Let X; be the 
point chosen on the ith move (by player I if i is odd, by player II if i is even). Let the 
payoff be f(X: , X:, --- , X,) where f is continuous. When the ith move has been made, 
the player who makes the (i + 1)st move will observe ¢;(X1 ,--- , Xi) (@@ =1,-+--,n — 1). 
If the ¢; are all constant, we have the case of no information so that the game has a value 
according to Ville’s minimax theorem. If the ¢; are all one-to-one, we have the case of per- 
fect information so the game has a value. In the present paper an example is presented to 
show that for n 2 3 these games do not always have values. For the case n = 2 it is proved 
that these games always have values. Existence of a value is proved for a special case with 
n = 3. The author is seeking additional conditions guaranteeing values of these games for 
arbitrary n. 





ABSTRACTS 613 


7. Explicit Results for the Dam with Poisson Input. JosepH M. Gani ANpb N. 
U. Prasuvu, Columbia University and Karnatak University. 


Let Z(t) be the content at time 0 S ¢t < & of an infinite dam, fed by Poisson inputs of 
magnitude h with parameter A, and subject to a steady continuous release ceasing when 
Z(t) = 0. The distribution function F(z, t) of Z(t) then satisfies the difference-differential 
equation dF (z,t)/at — dF (z,t)/az = —dA{F(z,t) — F(z — h,t)} (0S z < @). This particu- 
lar case of Takdcs’s integro-differential equation for the d.f. of the waiting time in a single- 
server queue yields an explicit solution for F (z, t). The probability of first emptiness of the 
dam at time T = z) + rh(r = 0, 1, 2, ---), starting with Z(0) = 2 , is given by g(zo ,T) = 
e~T)rzoT"-!/r!; from this, the probability of emptiness at time ¢ (not necessarily for the 
first time) may be derived as F (0,t) = em Dh eaed!4 y  ¢ — jh)t-'/)j!. Solving the difference- 
differential equation directly, the d.f. F(z, t) is finally found to be 


F(z, t) = Dis e-rch-m yn (rh — z)}r PO, 2 + t — rh)/ri. 


8. Some Stochastic Processes with Application to Counter Models. Rona.p 
Pyke, Columbia University. 


Let {Y,:n > — «} bea Renewal process with common distribution function H, and let 
{t; ,j > —} be successive time points of discontinuity for a doubly infinite Poisson proc- 
ess. Let f be any real-valued function defined on R, for which Ss, | f(t, y)| dH{y) dt < @, 
The processes {n(t):¢ >O} and {n*(t):— 0 <t< ©} determined by 


n(t) = LDostjso fit —t —t;, Y;)n*(@) = D0 <tj <0 f(t — t;, Y;) 


are studied and their one-dimensional characteristic functions obtained. For given 0 < a 
= b, either of these processes is said to be in state A at time tif n(t) S b and if the process 
has been less than a sometime since it last exceeded b. The expected number of counts 
(i.e., transitions from state A to state B), as well as the expected time in state A, during 
(0, t) are studied. For one case, approximations to the expected number of counts are ob- 
tained by the method of steepest descents. 


9. Use of Series Expansion in Estimation Problems for Distributions Involv- 
ing More Than One Parameter. (Preliminary report) Y. 8. Sarue, Uni- 
versity of Alberta. (By title) 


Guttman (Biometrika, Vol. 45 (1958), pp. 565-567) has given a method of determining 
an unbiased minimum variance estimator without taking conditional expectation with 
respect to a sufficient statistics under certain regularity conditions. The same method is 
extended for distributions depending on more than one parameter. If t, , 2, «+> , t; are 
sufficient statistics which assume non-negative integer values with probability 


kepi'-o3" +++ i'm (6 , 02, ++, 61) 
where ¢: are functions of 6} and k, is a function of ¢; and if 


G(6; , 02, +++, 0:1) = g(O: , 02, --+ 8:)/m(O, , 2, --* A) 


can be expanded as a power series of the form > A; oo" ¢:' where A, is a function 
of tj , then an unbiased minimum variance estimator of g(@; , 02 --- 6:) is Ar/ki (ki # 0). 
This method can be used to obtain unbiased minimum variance estimators for certain classes 
of distributions which involve more than one parameter for e.g. multinomial distribution. 





614 ABSTRACTS 


10. A Comparison of the Effectiveness of Tournaments. W. A. GLENN, Virginia 
Polytechnic Institute. 


Round robin, replicated knock-out and double elimination tournaments (in which play- 
ers are eliminated after two losses) are investigated for their effectiveness in selecting the 
best of four players. Denoting the probability that player i defeats player j by mi; , it is 
assumed that these parameters are constants satisfying a general set of inequality rela- 
tions. A best player is here defined as one having a probability greater than 4 of defeating 
pach of the others. It is further assumed that each game results in the selection of a winner, 
so that rj; = 1 — x; . Since in some tournaments two or three players may receive the 
same total number of wins, a play-off may be required for the determination of an ultimate 
winner. The criteria proposed for the comparison are the probability that the best player 
wins (after play-off if necessary) and the expected number of games required. For general 
values of the parameters expressions are derived for the evaluation of the criteria, and 
comparisons are made on the basis of series of assigned parameter values. The special case 
in which all but one of the players are of equal strength is considered in detail. The possi- 
bility of extending this investigation to cases involving a larger number of players is dis- 
cussed. 


11. Application of the Geometry of Quadrics in Finite Projective Space to the 
Construction of PBIB Designs. R. C. Bosz anp D. K. Ray-CHaupaurt, 
University of North Carolina. 


The geometry of quadrics in finite projective hyperspace has been applied to construct 
some series of PBIB designs, with two or three associate classes, including some new de- 
signs with parameters in the practically useful range. Let (C) and (D) be two classes of 
linear spaces such that spaces of a given class stand in the same geometrical relation to a 
quadric Q in PG(n, s), s = p™ where p is prime. Then in many instances, the incidence 
relationship of (C) and (D), provides a PBIB design. For example, if we take (C) as the 
class of points on a non-degenerate quadric Q, and (D) as the class of lines contained in Q, 
we get a PBIB design with the following parameters: 


v = N(0, n), b = Nil, n), r= N(0O, n — 2), k=s+1, a = 1, Ae = 0, 
m = sN(0,n — 2), nz = N(O,n) — sN(O,n — 2) —1, pu = (s —1) + s8*N(,n — 4), 
pu = N(0O, n — 2), 


where N (p, n) denotes the number of p-flats in Q and is given by the formulae, 


P 
(i) N(p, n) = IT [(s9#" — 1)/(s*# — 1)], 


=0 


ifn = 2k,psk—1; 


P 
(ii) N (p, n) = II [(s"-2e+2r — s*-pt+r + g*—ptr-1 i> 1)/(s?*!> — 1)], 


r=0 


ifn = 2k —1,p S k — 2 and Q is elliptic; 


Pp 
(iii) N(p, n) = IL [(en-29+® 4 gt-ptr — gh-ptr-1 — 1)/(grti- — 1)], 
r=0 


llA 
a 
| 


ifn = 2k —1,p: 


and Q is hyperbolic. 





ABSTRACTS 615 


12. Some New Cases of the Packing Problem in Finite Projective Space with 
Applications to Fractionally Replicated Designs. RK. C. Bosn, University 
of North Carolina. 


The general packing problem for the finite projective space PG (r — 1, p) of r — 1 dimen- 
sions based on the finite field with s = p™ elements may be stated as follows: Find the 
maximum number of points which can be chosen in PG(r — 1, p™) so that no ¢ lie on a linear 
space of dimensions ¢ — 2 or less. This number may be denoted by n = n;(r, 8), and the 
associated set of points may be said to give a tight packing of the tth order for the space. 
If t = 2u or 2u + 1 the coordinates of the points may be used to obtain a 1/s"~ fraction 
of the s* treatments in a factorial experiment with n factors, so that no u-factor or lower 
interaction, is aliased with a u-factor or lower interaction. In this paper it is shown that 
ns(6,3) = 12, n4(5,3) = 11. The associated set of points in the first case is the set of points 
common to the three quadrics Q: = 2x:z2 + ai%3 + %1%4 + Lots + Tot, + Fra, = 0, Q. = 
Like + 2Xrs5 + Wire + LoXs + LoXe— + Tsre_ = O, Qs = X3%q + 2X3X5 + L3%o + TMers + BZriz—e + 
2x32, = 0. Fractionally replicated designs 1/36 X 3! and 1/35 K 3" in which no main effect 
or two factor interaction is aliased with a main effect or two factor interaction follow. 


13. A Necessary Condition for Existence of a Regular and Symmetrical p.b.i.b. 
Design of Triangular Type. J. Ocawa, University of North Carolina. 


A necessary condition for existence of a symmetrical b.i.b. design in terms of the Hasse- 
Minkowoki’s p-invariant was obtained by 8. 8. Shrikhande. Similar necessary conditions 
for regular symmetrical p.b.i.b. design of group divisible type and for of Lz type were ob- 
tained by R. C. Bose, W. 8. Connor and §. 8. Shrikhande, respectively. The purpose of this 
note is to give a similar necessary condition for a regular symmetrical p.b.i.b. design of 
triangular type. 


14. Use of Partially Balanced Block Designs with Three Associate Classes for 
Confounded, Asymmetrical Factorial Arrangements. (Preliminary report) 
Baprig M. Kurxsian, Diamond Ordnance Fuze Laboratories. (By title) 


The results and technique, presented by Zelen (Ann. Math. Stat. 29 (1958) pp. 22-40) for 
the case of GD designs, are extended to treat two cases involving partially balanced incom- 
plete block designs with three associate classes when used in conjunction with factorial 
experiments. The two PBIB designs considered are those that result by (1) replacing each 
treatment of a BIB by another, complete BIB design and (2) replacing each treatment of 
a GD design by n treatments. Vartak (Ann. Math. Stat. 26 (1955) pp. 420-438) has shown 
that each of these designs is at most a PBIB with three associate classes. For each of these 
designs, the solutions of the reduced normal equations for the treatment estimates are 
found. With respect to the factorial aspect of the problem, the variances and covariances of 
the various main effects and interaction terms have been derived for the class of design 
above. It is shown that these can be written as Kroneker products of matrices which lead 
directly to the appropriate sums of squares associated with the analysis of variance. In 
addition, the inter-block analysis is worked out. 


15. Best Linear Estimates by Order Statistics of the Parameters of a Model 
for Failure Data. ANDRE G. LAURENT AND ELpon Rieum, Wayne State 
University and Bendix Aviation Research Laboratory. 


The paper presents tables of 1) the order statistics covariance matrix for samples of size 
n = 1 to 10 drawn from the population with survival function S(t) = exp{l + t — exp(t)], 





616 ABSTRACTS 


where ¢ = X/Eo or t = (X — Xo)/r;2) the minimum variance linear unbiased estimates 
of the parameters Hy , Xo , r, based on order statistics for singiy censored samples of size 
1 to 10 and their variances and covariances. Several aspects (failures, aging, waiting time) 
of the problem of ‘‘estimation cost’’ versus ‘‘precision’’ are considered and different esti- 
mates are compared from the viewpoint of efficiency. 


16. On Testability in Normal ANOVA and MANOVA with All “Fixed Effects.” 
S. N. Roy anv J. Roy, University of North Carolina and Indian Statisti- 
cal Institute. 


With the notation of the above paper it can be proved that, for standard form I, Rank 
A s Rank ie] S Rank A + Rank C (both equalities being unattainable at the same time), 
and, for standard form II, 0 S Rank A — Rank AB S m — Rank B (both equalities being 


unattainable at the same time), and, in any case, that what we are generally testing is an 
H¢ such that Ho C He C Model. In this paper it is shown that, for I, (i) if Rank A = 


A : . : . 
Rank bed < Rank A + Rank C, then Hp = He C Model, in which case Hp is said to be 


testable in the strong sense, (ii) if Rank A < Rank m < Rank A + Rank C, then H, Cc 

H¢ < Model, in which case Hp is said to be testable in the weak sense and (iii) if Rank 
A . ' ar 

A < Rank in = Rank A + Rank C; then Hy © He = Model, in which case Hp is said to 


be untestable. Likewise, for II, it is shown that (i) if 0 < Rank A — Rank AB = m — 
Rank B, then Hy) = H¢ C Model, (ii) if 0 < Rank A — Rank AB < m — Rank B, then 
Hy Cc He C Model and (iii) if 0 = Rank A — Rank AB < m — Rank B, then Hy CH = 
Model. 


17. Contributions to Univariate and Multivariate Analysis of Variance with 
“Fixed Effects,” Normal Error and “Random Effects” Not Necessarily 
Normal. 8. N. Roy anp Wuirrieitp Coss, University of North Carolina 
and The Woman’s College of the University of North Carolina (Greens- 
boro). 


For simplicity of discussion, a single response two factor experiment under certain broad 
classes of designs is considered and an additive model is postulated such that one factor 
goes with ‘‘fixed effects,’’ and the other one with ‘‘random effects’’ characterizable in terms 
of a random sample from an unknown continuous distribution which is assumed to be ap- 
proximated, in successive stages, by a two valued distribution with probabilities 4 and 3, 
a three valued distribution with probabilities 4, } and 4, and so on, the values in each being 
assumed to be unknown. The continuous variate is also assumed to be independent of the 
normal error. Under the model confidence bounds are obtained on the difference between 
the two values at the first stage, on the two consecutive differences (simultaneously) at 
the second stage, on the three consecutive differences (simultaneously) at the third stage, 
and so on. This is, of course, in addition to what is usually done for the ‘‘fixed effects.’’ 
These techniques are then extended, first to the case of an experiment with a single response 


and more than two factors, and then to multiple response and multifactor experiments. 
In the latter case, as a step toward this, a generalization has had to be made of the notion 
of m-tiles of a univariate distribution to the case of multivariate distributions. 





ABSTRACTS 617 


18. Some Nonparametric Analogues of “Normal’’ ANOVA and MANOVA and 
of Studies in ‘““Normal”’ Association. 8. N. Roy anp V. P. BuapxKar, Uni- 
versity of North Carolina. 


In a multifactor multiresponse experiment where some of the responses are assumed 
to be continuous, some discrete, some categorical with an implied ranking (like good, fair, 
poor, etc.) and the rest purely categorical, with a similar division into four classes for the 
factors, a finite number of class intervals are assigned to each continuous response or factor 
and a probability model is postulated in terms of a product-multinomial distribution with 
unknown probabilities in the multidimensional cells and preassigned weights or scores 
along all ‘“‘marginals,’’ response or factor, that are structured, and, of course, no such 
scores along the ‘“‘marginals’’ that are purely categorical. Under this model, hypotheses 
are posed analogous to (i) different kinds of hypotheses in ‘‘normal’’ model I ANOVA and 
MANOVA, including analysis of covariance and regression and (ii) different kinds of inde- 
pendence and regression relations in ‘‘normal’’ multivariate distributions. Large sample 
tests of such hypotheses are offered, and an indication is given as to how to obtain the 
asymptotic powers. 


19. On Moments of Order Statistics from Normal Populations. Z. GovinpDAra- 
JuLU, University of Minnesota. (By title) 


Let Xin < Xam +++ < Xnim be the order statistics (os) from a sample of size n from 
the standard normal population. Contributions by Tippet (1925), Hastings, et al (1947), 
Jones (1948), Godwin (1949), Cole (1951), Rosser (1951), Ruben (1956), Bose and Gupta 
(1956), Teichroew (1956), Sarhan and Greenberg (1956), have been made to the Problem 
of Moments of os from Normal Samples. Exact values of low moments of os for sample size 
six or less and numerical values for sample size twenty or less are available. In this investi 
gation simple recursion formulae among the first, second, and mixed (linear) moments have 
been derived. Certain identities among the moments which are true in general (that is 
without the use of normality) are also obtained. The above will enable one to extend Ru- 
ben’s table of moments of the largest os in samples of size fifty or less to moments of all 
os in samples of size fifty or less. It is shown that it is sufficient to know one first moment 
when n is even and one second moment when n is odd, in order to solve for the first and 
second moments of all os from sample of size n, in terms of those for the preceding n. It 
is also shown that at most (n — 4)/2 mixed moments for even n, and (n — 3)/2 mixed 
ones together with one second moment for odd n are sufficient to solve for the product- 
moment matrix of the vector of ordered variables in sample of size n, in terms of the second 
and mixed moments of os for the preceding n. 


20. A Note on J. Roy’s “Step-down Procedure in Multivariate Analysis.” V. P. 
Buapkar, University of North Carolina. (By title) 


Test criteria in multivariate analysis are usually derived either from the \-criterion or 
the largest and/or the smallest characteristic roots. Both of these can be regarded as special 
cases of the general ‘‘union-intersection”’ principle. An alternative procedure, called the 
‘‘step-down’’ procedure, was used by 8. N. Roy and R. E. Bargmann (Ann. Math. Stat. 
Vol. 29 (1958), 491-503) to test multiple independence of normal variables. This procedure 
was recently applied by J. Roy (Ann. Math. Stat. Vol. 29 (1958), pp. 1177-1187) to derive 
test criteria for a large class of hypotheses other than that of multiple independence. In 
this note, J. Roy’s method has been used to test multiple independence of normal variables 
with means given by a general linear model. Simultaneous confindence bounds on an ap- 
propriate set of ‘‘deviation-parameters”’ are also obtained. 





618 ABSTRACTS 


21. On a Class of Problems in Multivariate Analysis of Variance. 8S. N. Roy 
AND J. Roy, University of North Carolina and Indian Statistical Institute. 
(By title) 


Assuming for a model that X = [x, --- x,] is a set of n independent stochastic vectors 
pXn 
such thatx;:N[E( x; ), = )(¢ = 1,2, ---,n) andfurthermore that E(X’)= A — , where 
pXl Xp nXp nXm mXp 


A is a known matrix given by the design and by what the experimental statisticians call 
the ‘‘model,”’ and £ is a matrix of unknown parameters, a hypothesis is posed in the form (to 


be called standard form II) Ho: —& = B_~ 7 , where B is given, but 7 is unknown. For a 
mXp mxXk kXp 


hypothesis expressed in this form (which is convenient for a wide class of problems includ- 
ing that of linearity of regression) the matrices S* and S due to the hypothesis and due to 
the error are computed as a preliminary to the construction of different alternative tests, 
one of them being that in terms of the largest root of S*S~!. The tie-up is then discussed 
between any test of an hypothesis formulated this way and the corresponding test of an 


equivalent hypothesis formulated in the form to be called standard form I Hy: C 
sXm 
& = O where C is given but £, of course, is assumed to be unknown. 
mXp sXp 


22. A Note on Confidence Bounds Connected with ANOVA and MANOVA for 
Balanced and Partially Balanced Incomplete Block Designs. V. P. Buap- 
KAR, University of North Carolina. (By title) 


It is known that confidence bounds can be placed on the “‘deviation parameter”’ as- 
sociated with the test of a linear hypothesis. This “‘deviation parameter’ can be regarded 
as a measure of departure from the ‘‘total’’ hypothesis under consideration. It is also pos- 
sible to make simultaneous confidence statements about ‘‘partial’’ deviation parameters 
which can be regarded as measures of departure from various ‘‘partials’’ of the ‘‘total’’ 
hypothesis. In this note, the hypothesis considered is that of equality of treatment effects 
(scalar effects for ANOVA and vector effects for MANOVA) in experimental ae The 
ANOVA deviation parameter for BIBD turns out to be (Av /k)>°* =) &? where & = t; — t, 


or (Av/k) e. Ena Where — =t.x1 — tox [with t = (t,- -, t)], and the MANOV A deviation 
parameter Chinax [Epx» Eoxp], Where Exp = Toxp — Tons [with Thx. = tt, t, tee rt) p|. The 


partial ANOVA deviation parameters for BIBD are found to be ow/k I, & associ- 
ated with the partialhypothesis ta, = tag = --- = tey, where fa; = ta; — (Ftay/v); with 
corresponding forms for the partial MANOVA deviation parameters. The ANOVA devi- 
ation parameter for PBIBD is); > a ai; it; where a;; = r(k — 1)/k and aj; = — Am/k 
if the ith and jth treatments are mth associates with a corresponding form for MANOVA. 
In general, the ANOVA deviation parameter is —’Cé where C is the matrix of coefficients 
of the adjusted normal equations with a corresponding form for the MANOVA. 


23. Sufficient Partitions for a Class of Coin-Tossing Problems. (Preliminary 
report) T. V. Narayana, University of Alberta. 


In the following experiment G, is considered (cf. also Blackwell and Girshick, Theory of 
Games and Statistical Decisions, p. 222): The probability of a coin falling head is p; with 
(0 < p: < 1), if in the previous trial the outcome was tail; and the probability of its falling 
head is pz with (0 < pe < 1) if in the previous trial the outcome was head. At the first trial 
the probability of head is p, . G, consists of tossing the coin until that trial when the total 
number of heads exceeds the number of tails by r for the first time. The sample space 





ABSTRACTS 619 


Z = (Z,Q, p) is considered, where Z consists of points representing the outcomes of G, , 
Q2= (<p <1) X © < pz < 1) with p, + pe > 1. Using a combinatorial lemma estab- 
lished by the author (Comptes Rendus, t. 240, pp. 1188-89), a sufficient partition for this 


sample space has been determined, for all r 2 2. The problem of estimating p,, p: is being 
considered. 


24. Estimation of the Mean and Variance of a Quantitative Characteristic in a 
Polygenic System. ALLAN G. ANDERSON, Western Kentucky State College. 


A system is considered in which a quantitative characteristic (such as yield of corn) is 
affected by many gene-pairs whose contributions are equal and additive. It is assumed that 
means and variances are known for a set of inbred parent lines, and formulae are developed 
for the estimation of mean and variance of any hybrid descended from members of the 
inbred set by means of a known pedigree. One year of field experimentation is required to 
obtain needed data on all first generation hybrid crosses possible among the inbreds, but 
from then on attention can be directed toward those strains for which the prognosis is 
favorable based on the estimated means and variances. 


25. Selecting a Subset Containing the Best of Several Binomial Populations. 
S. S. Gupra anp M. Sosgn, Bell Telephone Laboratories, Inc. 


Let z; denote the number of successes in a known number n, of observations from a bi- 
nomial population 0; with unknown probability p; of success in a single trial; let y; = 
xi/n; (i = 1,2, ---, k). The problem is to select a subset of the k populations I; so that 
that the “‘best’’ population (i.e., the population with the largest value of p) will be included 
in the selected subset with a preassigned probability P*, regardless of the true values of 
the p; . The suggested procedure R is ‘‘Retain a population II, if and only if y; 2 max (y; , 
Y2,°**, Ye) — 6’. The constant 6(2 0) is determined so as to satisfy the given probability 
requirement. Expressions for the probability of a correct selection (i.e., the selection of a 
subset containing the best population) for the procedure FR are derived and, in the case of 
a common number n of observations, these are used to construct tables of the smallest 
constant needed to carry out the procedure R for selected values of n, k and P*. Formulae 
are obtained for the expected number of populations retained in the selected subset and 
tables are given for the expected proportion of populations retained. Alternative procedures 
based on the transformation of variables are briefly discussed. 


26. Hadamard Matrices and a Problem in the Theory of Code Construction. 
R. C. Bose anv S. 8. SHrrKHANDE, University of North Carolina. 


A sequence a = (a; , a2, -*-, Gn), ai = 0 or 1 is called an n-place message. Hamming 
distance between two n-place messages is the number of positions which are different. Let 
A(n,d) denote the maximum number of n-place messages that can be constructed such that 
the distance between any two messages is greater than or equal to a pre-assigned positive 
integer d < n. We prove the result that the following statements are equivalent: (i) 
A (4t, 2t) = 8t, (ii) A(4t — 1, 4t) = 4¢, (iii) there exists a symmetric balanced incomplete 
block design with parameters v = b = 4t— 1,r = k = 2¢— 1, = t — 1, and (iv) a Hada- 
mard matrix of order 4 exists. This generalizes results of Plotkin (Research Division Re- 
print 51-20 (1951), University of Pennsylvania, Moore School of Electrical Engineering), 
where he showed that A (4/, 24) S 8t and A(4t — 1, 4t) S 4¢ and that the maximal codes 
with A (4t, 24) = 8t and A(4t —1, 4t) = 4¢ could be constructed if 4¢ — 1 is a prime. The 
structure of these maximal codes is also investigated here. These maximal codes can be 
constructed for all values of ¢ S 50, except possibly for ¢ = 23, 29, 39, 46 and 47. 





620 ABSTRACTS 


27. Modified Neyman-Pearson Methods Which Avoid “Paradoxes” and Tend 
to Coincide with Other Methods. (Preliminary report) ALLAN BrrnBAuUM, 
Columbia University. 


The methods proposed take the following form in problems of testing between two simple 
hypotheses H; specifying respective densities p;(z), i = 1, 2, if r(X) = po(X)/pi(X) has 
a continuous c.d.f. F;(u) under H; : If outcome z is observed, report the pair of error levels 
a(z), B(x), where a(z) = 1 — F,(r(z)) and B(x) = F2(r(x)); if g(x) = B(z)/a(z) > 1, 
the observation z is considered strong evidence for H2 as against H, (in the usual general 
Neyman-Pearson sense); if q(x) is not far from 1, the observation is considered inconclusive 
evidence for comparing the hypotheses, etc. Such methods avoid usual accept-or-reject 
formulations which seem over-schematic in scientific research applications. They avoid 
semblances of reaching strong conclusions from weak data which may appear with more 
conventional methods (ef. e.g. Cohen, Ann. Math. Stat., Vol. 29 (1958), pp. 947-972). In 
any experiment providing at least a moderate amount of information, these methods give 
inferences which tend to agree with those obtained by maximum-likelihood methods; or 
Bayesean methods (excluding only extremely unequal a priori probabilities; cf. Lindley, 
Biometrika, Vol. 44 (1957), pp. 187-192); or Lehmann’s approach (excluding only k very far 
from 1, on p. 1171 of Ann. Math. Stat., Vol. 29 (1958), pp. 1167-1176). 


28. Hypothesis Tests on the Population Lower Limit (Minimum Life). Paiiuip 
G. Cartson, Arthur Andersen & Co. (introduced by S. B. Littauer). (By 
title) 


Let 2; < %2 < 2; < --- < 2, be an ordered sample of n elements from a population F 
(x, ¢, a, K) where z is a random variable, « is the lower limit, ¢ is a scale parameter, and 
K is a known family parameter. For certain populations the statistic h, = (11 — €)/(2, — 
x,) is independent of the scale parameter o, and can be used for testing hypotheses on the 
(unknown) lower limit e«. For the population F(z, e,0, K) = ((z — «)/o)*, e S z, the h, 
statistic has the probability function R(h,) = 1 — {1 — [hn/(h, + 1)*}*". The moments 
are given (for K = 1) by Eh), = n8(t+1,n). For F(z, e,o,K) =1— exp {—[(z — «)/o]¥}, 
¢ S x, which is the Third Asymptotic Distribution of Extreme Values, h, has the probability 
function R(ha) = 1 — [n/{1— [hn/(An + 1)]*}] B[n[An/ (ha + DIX/{1 — [An/(hn + 1))%} 
+ 1,n]. For K = 1, the moments are related by 


Eh’, = [t(n — 1)/n(t — 1)) [[(m — 1)/n}* BaD, — Eat"). 


Since this family has been used widely to explain fatigue and reliability phenomena, the 
h, statistic can be used to test the minimum life of material, or of a component or set of 
components. 


29. Truncation and Tests of Hypotheses II. Irwin Gutrman, Princeton Uni- 
versity. (By title) 


The distribution of the sum of squares of n truncated normal variables is derived for 
the case n = 1(1)4, where the terminus point is “‘a’’ standard deviations on either side of 
the mean. The difference in power and size of tests of hypotheses concerning the variance 
(the mean assumed known) is contrasted with the usual procedure, i.e. assuming a random 
variable has a ‘complete’ normal distribution, and the correct significance points are ob- 
tained. An indication of the corresponding ‘“‘F’”’ situation is given. 





ABSTRACTS 621 


30. Randomization and Factorial Experiments. SytvAin EHRENFELD, New York 
University. (By title) 


This paper examines several questions relating to the 2 factorial series of experimental 
designs. One question that is considered is the effect on the usual estimating and testing 
procedures of a 2" factorial experiment when the relevant number of factors are k + n. If 
the levels of the remaining n factors were chosen systematically we have, in effect, a 1/2" * 
2*+ fractional factorial. This question is further examined in terms of randomized choices 
of the levels of the n factors. This is, in effect, a randomized fractional factorial where 2" 
out of 2*** experiments are chosen by some randomized procedure. The question of the 
various methods of carrying out such procedures is examined. A particular procedure is 
outlined, whereby one can estimate subsets of the effects with randomized fractional fac 
torials without the usual assumptions of negligible interactions. The above procedure lends 
itself to an approach whereby a larger and larger fraction of the full factorial is used. At 
each stage, the usual testing and estimation procedures can be carried out. These methods 
are particularly important when experimentation, for exploratory purposes, with a large 
number of factors, is carried out sequentially. 


31. Geometrical Methods in the Construction of Group Alphabets. R. C. Boss 
AND Roy R. Kuester, JRr., University of North Carolina. (Invited paper) 


Notions of finite projective geometry are applied to the group alphabet introduced by 
Slepian (Bell System Technical Journal, Vol. 35 (1956), pp. 203-234). This alphabet is a 
2*-element subgroup of the Abelian group of 2" sequences of n binary digits. The construc 
tion of such an alphabet is equivalent to the distribution of an integral W-measure over the 
points of PG(k — 1, 2), the measure of each point being the weight (number of unities) of a 
certain sequence (letter) of the alphabet. Necessary and sufficient conditions that an in 
tegral measure define a group alphabet include the congruences Wy_2,; = 0 (mod 2*~*) for 
all i, where W,_2,; denotes the sum of the W-measures of all points on the ith (k-2)-flat of 
PG(k — 1, 2). The additional congruence conditions W,,, = 0 (mod 2°) for all u, c¢ = 1, 2, 

-, k — 3, are necessary. Geometrical considerations are applied to the problems of (1) 
satisfying the congruence conditions, (2) finding the relation between n and W (W being 
defined as the largest integer such that the code associated with the alphabet will correct all 
transmission errors of multiplicity up to and including W), and (3) counting the number 
of (W + 1)-tuple errors which will be corrected. Specific results are given for k = 2, 3, 4. 


32. A Simple Minimum-Average-Risk Procedure for the Multiple Comparisons 
Problem. (Preliminary report) Daviw B. Duncan, University of North 
Carolina. (Invited paper) 


Let [y: , --+, Yn , 8] be a sufficient estimator for [u: , ---, wu. , 7] such that, to take a typical 
simple case, [y: , ---, Yn] is normally distributed with mean |u: , ---, u,.| and variance Jo’, 
and s? is the usual form of independent estimate (with » degrees of freedom) for «*. Let T 
represent the class of n(n — 1) differences T = {rir = (ui — uj) /+/20; 1,7 = 1, +--+, n; 
i # j}. The sub-set system of the multiple comparisons problem considered is that formed 


; sits j < 
as the restricted product of the two-decision component-problem subset pairs r > 0, r 50 


for all r e T. A Bayes solution of the form indicated in [1] Duncan, Ann. Math. Stat., 1958, 
p. 622, is developed for each component problem. Their simultaneous application, all re 7’, 
is shown (see also, [2}) Lehmann, Ann. Math. Stat., 1957, pp. 1-25) to be the Bayes solution 
to the given multiple comparisons problem with respect to a loss function formed as the 
sum of the component loss functions and to a Bayes function having the component Bayes 





622 ABSTRACTS 


functions at its margins. The table of ¢ in [1] is used for each component solution, the test 
statistic now also being a function of the residual variance among the y’s and having »v + 
n — 2 instead of » degrees of freedom. (Research jointly supported by the U. S. Air Force 
through the Office of Scientific Research of the Air Research and Development Command 
and by the U. 8. Public Health Service). 


(Abstracts of papers presented at the Cleveland, Ohio Meeting 
of the Institute, April 2-4, 1959.) 


1. Relation Between Certain Incomplete Block Design. 8S. 8S. SHrikHANDE, 
University of North Carolina. 


The following results are proved. Theorem (1). A partially balanced incomplete block 
design with two associate classes, and with parameters u = (n+1)_.,6=n,,r=n-— 1, 
k=n+1,m = 2n—2,r%y = 1, = 2, pn =n—1, Dit = 4 has triangular association 
scheme (7,41). (Bose and Shimamoto, J.A.S.A., Vol. 47 (1952), pp. 151-184). Theorem (2). 
For any value of n, the existence of the balanced incomplete block design D, with parame- 
ters u=n,,b= (n+1)4,r=n+1,k = n — 1, = 2 implies the existence of two 
p.b.i.b. designs, Dy with parameters u = b= ng,r =k =n—1,m = 2n—4,r»y = 1, 
he = 2, pn = n — 2, pn = 4, and Dy with parameters u = n,,b6 = n,k =n—1,r = 2, 
nm, = 2n — 4,%. = 1,42 = 0, pn = n — 2, pin. = 4such that D, = Dy, + Diz , where “+” 
denotes the fact that blocks of Dy, and Di: , taken together give the design D, . Theorem (3). 
The existence of the design D,, implies the existence of the corresponding design D, , if the 
association scheme of Dy is triangular (7). Theorem (4). If n = 5, orn = 9, the dual of the 
design D, is another p.b.i.b. with the same parameters. A constructive method of em- 
bedding the design D, into the corresponding symmetric b.i.b.d. with parameters u = b = 
(n? + n+ 2)/2,r =k =n+1,d = 2is also given. 


2. Quasi-Ranges of Samples from an Exponential Distribution. Pau. R. 
Riper, Wright Air Development Center. 


The distribution of quasi-ranges of samples from an exponential population is given, as 
are formulas for the cumulants of the distribution. 


3. Asymptotic Rate of Discrimination for Markov Processes. Lampert H. 
KoopMans, Sandia Corporation. 


Simple hypotheses Hp and Hg specifying two distinct, positive, transition densities 
p(x | y) and q(x | y) and positive initial densities rp (x) and xg(x) with respect to a finite 
Lesbesgue-Stieltjes measure are assumed for a discrete time parameter Markov process. 
Let R, be the likelihood ratio statistic based on the first n + 1 observations of the proc- 
ess and consider the class of sequences of likelihood ratio tests T(a) = {{[R, > na]: n = 
0, 1, 2, +--+ } generated by letting a vary in the interval — © < a < o. If the function 
K:i (a, y) = p'-*(z\y)q'(z|y) satisfies a certain regularity condition it is shown that there 
exist limits /p and lg , lp < 0 < lg , independent of the initial densities, such that the se- 
quences 7'(a) are consistent for lp < a < lg and inconsistent in the complement of the clo- 
sure of this interval. Furthermore, the rate at which the error probabilities tend to zero is 
exponential for 1p < a < lg . An asymptotic rate of discrimination p(P, Q) is defined which 
is a measure of the limiting behavior of the class of consistent likelihood ratio sequences 
T (a). It is shown that p(P, Q) is the infimum, over the unit interval, of the largest eigen- 
value of the integral operator with kernel K;(z, y). Several examples are considered and 
an extension to Markov processes with respect to arbitrary Lebesgue-Stieltjes measures 
is indicated. 





ABSTRACTS 623 


4. Distribution of a Quadratic Form in Three Variables. (Preliminary report) 
AnprRE G. LavurENT, Wayne State University. 


Let X = (u,v,w)’ be normally distributed N(0,S),S > 0. A series expansion involving 
confluent hypergeometric functions is proposed for the distribution of X’X. Further, 
P(X’X sR?) => al, (R)R26*, where the a, are the coefficients of the series expansion 
corresponding to the two dimensional case and J; can be obtained by recursion from J, and 
I, , which are easy to compute with the help of a table of normal integrals and ordinates. 


5. On Exponentially-Mapped-Past Statistical Variables. (Preliminary report) 
JOSEPH OTTERMAN, Willow Run Laboratories. 


The exponentially-mapped-past statistical variables are quantities relating to a set of 
observations computed in such a way that the recent values of the observations contribute 
more strongly than the values observed in the more distant past. The relative weighting is 
a geometrical ratio in the case of discrete (naturally discrete or sampled) data and an 
exponential function in the case of continuously observed functions. In this paper defini- 
tions of some exponentially-mapped-past variables are introduced and certain simple 
relationships are discussed. The distinct computational advantages of the e.m.p. variables, 
such as e.m.p. average and e.m.p. variance, are pointed out. 


6. Simultaneous Comparison of the Optimum and Sign Tests of a Normal 
Mean. R. R. Banapur, Indian Statistical Institute. (By title) 


This paper gives a detailed example of a general method of comparing two tests. Con- 
sider a sample of n independent observations from an N(u, 1) population and suppose it is 
desired to test un = Oagainst » > 0. Let Lo(n) and L,(n) denote the significance levels actually 
attained in the given case by the optimum and sign tests respectively. The paper studies 
the asymptotic joint distribution of Zo(n) and L,(n). It is shown that if » = 0 the limiting 
distribution of Lo and L, is that of G(U) and G(V), where G is the N (0, 1) distribution func- 
tion, and U and V are correlated N(0, 1) variables, the correlation being (2/7r)'. In case 
uw > 0, Lo and L, tend to zero, the asymptotic relationship being (roughly speaking) Lo(n) = 
7-L,(n), where 7 is a constant depending on uw such that 0 < + < 1 and such that 7 decreases 
as u increases. The question of estimating or predicting the actual value of Lo giver only n 
and L, is discussed. It is shown that, with attainment of an assigned level as the criterion, 
¢ = 2log.[2p?q"]/u? serves as the asymptotic efficiency of the sign test, where p = P(X > 0|y), 
q = 1 — p. If the variance is not known to be 1, y* is replaced by log (1 + u*) in the for- 
mula for ¢. 


7. Some Estimates of the Binomial Distribution Function. R. R. Banapvur, 
Indian Statistical Institute. (By title) 

Let p be given, 0 < p < 1. Let n and k be positive integers such that np S k S n, and 

let Ba(k) = tun (") pq" *, where gq = 1 — p. It is shown that B,(k) = [(") p*q” ‘| 


-q°-F(n+1,1;k +1; p), where F denotes the hypergeometric function, This representation 
seems useful for numerical as well as theoretical investigations of small tail probabilities. 


The representation yields, in particular, the result that with A,(k) = (") prqr**'.[(k+1) 


k+1— (n+ 1)p], we have 1 S A,(k)/B, (kK) S 1 + 2, where t = (k — np) (npq)t 


(Theorem 1). Next, let N,,(k) denote the Normal approximation to B,(k), and let C,(k) = 
(x + (g/np)!)- (2r)t-exp(z?/2). It is shown that (A,N,C,)/B,—~1asn— ©, provided only 





624 ABSTRACTS 


that & varies with n so that z 2 0 for each n (Theorem 2). It follows that A,/B, — 1 if and 
only if z— @ (i.e. B, 0) (Corollary 1). It also follows that N,/B, — 1 if and only if A.C, 
— 1. This last condition reduces to z = o(n/*) for certain values of p, but is weaker than 
this for the other values; in particular, there are values of p for which A,C, —1 can hold 
without requiring even that k/n — p (Corollary 3). 


8. Distribution-Free and Nonparametric Tolerance Regions: The Exponential 
Case. (Preliminary report) Leo A. GoopMAN AND ALBERT MADANSKY, 
University of Chicago and The Rand Corporation. (By title) 


Exact one-sided and approximate two-sided @ content tolerance regions at confidence 
level C are developed based on the first r ordered observations from a sample of n exponen- 
tially distributed variables. These regions are compared with nonparametric one-sided and 
two-sided tolerance regions. Optimal properties of these regions are discussed, as is the 
asymptotic behavior of the tolerance regions. It is also shown that these regions are distri- 
bution-free in the sense defined by Fraser and Guttman, Ann. Math. Stat., Vol. 27 (1956), 
pp. 162-79. The effect of assuming an exponential distribution, when in fact the distribution 
is a mixture of two exponentials, is discussed. Also, uniformly most powerful invariant one- 
sided and two sided 8-expectation tolerance regions are derived. Some of the results pre- 
sented by Goodman, Ann. Math. Stat., Vol. 24 (1953), pp. 139-40, are extended. 


9. Bayesian Lot-by-lot Sampling Inspection. HerBperr B. Ersenpere, Iowa 
State College. (By title) 


Based on the Work of Arrow, Blackwell and Girshick (Econometrica, 1949), this paper 
derives Bayesian single, double and sequential attribute sampling plans. Lot quality dis- 
tributions considered are the binomial, two-point, and degenerate one-point. The loss 
function considered is negative profit, given by Lace, = —sd — wN(1— p) + us (Np — d) + 
ni, Lee, = —sNp — uN (1 — p) + Ni, N being size of lot. Profit efficiencies of single and 
double sampling plans relative to sequential plans are computed in specific cases. Partial 
ignorance is considered by evaluating loss incurred when optimizing with respect to the 
wrong lot quality distribution. As expected, sampling never pays in the binomial case; in 
the two-point case, the optimum sequential plan is not necessarily hypergeometric SPRT; 
indeed, the acceptance portion of the boundary need not be connected. 


10. Combining Inter-block and Intra-block Information in Balanced Incom- 
plete Block Designs. Franxuin A. GrAyBILL AND Davin L. Weeks, Ok- 
lahoma State University. (By title) 


When an Eisenhart Model III (blocks random, error random) is assumed in a balanced 
incomplete block, two independent estimates of treatment differences have been exhibited 
by Yates. A combined estimate of treatment differences has also been set forth by Yates 
but none of the properties of the combined estimate have been given. 

It is the purpose of this paper to show that Yates’ combined estimate is based on a set 
of minimal sufficient statistics. A combined estimate is set forth in the paper which is shown 
to be unbiased and which is also based on a set of minimal sufficient statistics. 


11. Minimal Sufficient Statistics in Incomplete Block Designs, Model II. Davip 
L. WEEKS AND FRANKLIN A. GrayBILL, Oklahoma State University. (By 
title) 


Under the assumption of an Eisenhart Model II in a balanced incomplete block design, 
minimal sufficient statistics are exhibited which have dimension six. These six statistics 





ABSTRACTS 


can be found from the quantities which are used to obtain an analysis of variance by the 
recovery of inter-block information method. The distribution of the six statistics is dis 


cussed. Similar results have been found for certain types of partially balanced incomplete 
block designs with two associate classes. 


12. Some Theorems Concerning Eisenhart’s Model II. Frankuin A. GRayBILL 
AND Rosert Huurauist, Oklahoma State University. (By title) 


Eisenhart’s analysis of variance Model II can be written as follows: Y = X8 + e or 
Y = XB: + e where 8) = uw is a scalar constant, 8;, (¢ ~ 0) is a vector of p; random 
variables such that £(8;) = 0; £6;8;' = oil; ¢ is a vector of n random variables such that 
E(e) = 0; E(ee’) = oi4: J; all random variables are independent. This model is studied 
with respect to point estimation. Under the assumption that all random variables are 
normal variables, theorems on the following were proved: (1) The maximum and minimum 
number of distinct characteristic roots of the covariance matrix of Y; (2) Conditions on 
the design matrix X for complete sufficient statistics to exist; (3) Minimal sufficient statis 
tics when the design matrix satisfies certain conditions; (4) Analysis of variance estimators. 
When the random variables are not normal, theorems on the following were proved: (1) 


Uniformly best (minimum variance) unbiased quadratic estimates of the oj; (2) Estimable 
functions of the oj. 





NEWS AND NOTICES 


Readers are invited to submit to the Secretary of the Institute news 
items of interest 


Personal Items 


Robert Bechhofer of the Department of Industrial and Engineering Adminis- 
tration, Sibley School of Mechanical Engineering, Cornell University, is on 
sabbatic leave for the academic year 1958-59. He is spending the period at 
Stanford University where he is Visiting Professor of Statistics in the Applied 
Mathematics and Statistics Laboratory and in the Department of Preventive 
Medicine. 

Douglas G. Chapman is on leave of absence from the University of Washington 
during the current academic year and is a Visiting Professor at the Department 
of Experimental Statistics, North Carolina State College, Raleigh, North Caro- 
lina. 

Dr. Sebastian B. Littauer has been appointed as consultant in Operations 
Research and Statistical Quality Control at Calkin and Bayley, Inc., Industrial 
Consultants, New York. Dr. Littauer is Professor in Statistical Quality Control 
and Operations Research at Columbia University. 


New Members 


The following persons have been elected to membership in the Institute 
October 17, 1958 to February 6, 1959 


Ady, Robert M., B.S. (Monmouth College), Senior Engineer, Sylvania-Reconnaissance Sys 
tems Laboratory, Box 188, Mountain View, California. 

Asher, Harold, Ph.D., (Ohio State U.), Manager-Cost Analysis, Technical Military Plan- 
ning Operation, General Electric Co., 735 State St., Santa Barbara, Calif. 

Boes, Duane C., M.S., (Purdue U), Research Fellowship, Purdue University, Statistical 
Laboratory, Purdue University, Lafayette, Indiana. 

Brown, Bradford, S., M.S.,(U. of Illinois) , Service Engineer, Engineering Dept., E.1. DuPont 
deNemours and Co., Buffalo Ave.and Chemical Road, Niagara Falls, N.Y. 

Calvin, James S., Ph.D., (Yale U.), Professor, Psychology Dept., U. of Kentucky, Lexing- 
ton, Kentucky, Depariment of Statistics, U. of California, Berkeley, Calif. 

Charles, Gerald T., B.S., (Roosevelt U.), Statistician, Remington-Rand Univac, 1902 West 
Minnehaha Ave., St. Paul, Minn., 1402 N. DuPont Ave., Minneapolis 11, Minnesota. 

Chorneyko, Thor Z., M.A., (Saskatchewan), Graduate Assist., University of Alberta, Edmon- 
ton, Alberta, Canada. 

Crawford, Gordon Bradley, B.S., (U. of Oregon), Graduate Assistant, Dept. of Mathematics, 
University of Oregon, Eugene, Oregon. 

Drew, Bruce A., B.S., (Wayne University), Chemist, Sr., Hercules Powder Co., Hopewell, 
Virginia, 205 South 4th Ave., Hopewell, Virginia. 

Fasbinder, Stanley, M.A., (Wayne State U.), Teaching Fellow, Wayne State University 
Detroit 2, Michigan, 620 W. Forest, Detroit 1, Michigan. 

Feldman, Dorian, B.S., (Yale U.), Research Assistant, Statistics Dept., University of Cali- 
fornia, Berkeley 4, California. 


626 





NEWS AND NOTICES 627 


Freiberger, Walter F., Ph.D., (Cambridge, Eng.), Associate Professor, Mathematics Dept., 
Brown University, Providence 12, R. I. 

Hachigian, Jack, B.S., (U. of Michigan), Student, Mathematics Dept., Indiana University, 
Bloomington, Indiana. 

Hamaker, H. C., Dr. (Utrecht U.), Chief Statistician, Philips Research Laboratories, Eind- 
hoven, Netherlands, % Bibliotheekcentrale, N. V. Philips’Gloeilampenfabrieken Eind- 
hoven, Netherlands. 

Hastings, Wilfred Keith, M.A., (U. of Toronto), Electronic Data Processing Consultant, 
H. 8. Gellman and Co., Ltd., 110 Bloor St. W., Toronto, Ontario, Canada, 1580 Bathurst 
St., Apt. 39, Toronto, Ontario, Canada. 

Hickman, David James, B.S., (Morehouse College), Instructor of Mathematics, Mississippi 
Vocational College, Itta Bena, Mississippi. 

Hickman, Donald Louis, B.S., (Morehouse College), Instructor of Mathematics, Mississippi 
Vocational College, Itta Bena, Mississippi. 

Ibbetson, David Noel Isserlis, B.Sc., (Imperial College, London), Statistician, Fisons Ltd., 
Harvest House, Felixstow, Suffolk, England, 8A Station Road, Edgware, Middz, England. 

Knight, William R., M.A., (U. of British Columbia), Graduate Student, University of 
Toronto, Toronto 5, Ontario, Canada, 690 Huron St., Toronto 5, Ontario, Canada. 

Kono, Kazumasa, M.Se., (Kyushu U.), Assistant Professor, Junior Course, Kyushu U.., 
Otsubo Machi, Fukuoka, Japan. 

Legault, Richard R., M.A., (Chicago U.), Graduate Student, U. of Michigan, Dept. of 
Math., Ann Arbor, Michigan, 709 Indianola, Ann Arbor, Michigan. 

Marshall, Clifford W., M.S8., (Polytechnic Institute of Brooklyn), Staff Member, Weapons 
Systems Evaluation Group, Institute for Defense Analyses, The Pentagon, Washington 
25, D. C., 264 North Ocean Ave., Patchogue, N. Y. 

Miloslavsky, George C., Greigetender, Loma Art Textile Co., Inc., 52 E. 19th Street, New 
York 3, N. Y., 80 Sheridan Avenue, Staten Island, 4, N.Y. 

Mittman, Arthur, Ph.D., (State U. of Iowa), Director, University Examinations Service, 
State U. of !owa, Examinations Service, 114 University Hall, Iowa City, Iowa. 

Mode, Charles J., Ph.D., (U. of Cal.), Assist. Prof., Montana State College, Bozeman, 
Department of Math., M.S. C., Bozeman, Montana. 

Noel, Charles A., B.S. (Manhattan College), Associate Mathematician, 7.B.M., Watson 
Laboratory, 612 W. 115th Street, New York 25, N.Y. 

Onate, Burton T., M.S., (Iowa State College), Assistant Director, Office of Statistical Co- 
ordination and Standards, National Economic Council, Padre Faura, Manila, Philip- 
pines, College, Laguna, Philippines. 

Richards, Dale Owen, M.S., (Iowa State College), Instructor, Iowa State College, Indus- 
trial Engineering Dept., Ames, Iowa, Route 10 Aplin Road, Ames, Iowa. 

Sand, Francis M., M.S. (Brooklyn Polytechnic Institute), Graduate Student, General 
Electric Foundation Fellow, Princeton University, Fine Hall, Princeton University, 
Princeton, N. J. 

Sathe, Yashawant Sadashiv, M.Sc., (Bombay University), Graduate Student, Dept. of Mathe- 
matics, University of Alberta, Edmonton Alberta, Canada. 

Schwartz, Fred J., B.S., (College of Engr., N. Y. U.), Technical Writer, McGraw-Hill Book 
Co., Inc., 330 West 42nd Street, New York 36, New York, 1015 Washington Avenue, 
Brooklyn 25, N.Y. 

Thigpen, Charles C., M.S., (Univ. of Tenn.), Stat. Dept., University of Tennessee, Knozville, 
Tennessee. 

Thomas, Ralph E., M.A., (Ohio State U.), Assistant Division Consultant, Battelle Memorial 
Institute, 505 King Avenue, Columbus 1, Ohio. 

Vanderbeck, John P., Graduate of Pharmacy (St. Louis College of Pharmacy), Head, Test 
and Evaluation Branch, Code 3035, U. 8S. Naval Ordnance Test Station, Test Depart- 





628 NEWS AND NOTICES 


ment, Michelson Laboratory, China Lake, California, 207 B. Mitscher, China Lake, 
California. 

Wonnacott, Thomas H., B.A., (U. of Western Ontario), Graduate Student, Princeton Univ., 
Graduate College, Princeton, New Jersey. 

Zurick, Leslie, J.. M.A., (Columbia U.), Statistician, Owens-Corning Fiberglas, Ashton, 
R. I., Fundamental Plastics Research, Owens, Corning Fiberglas, Ashton, R. I. 


rr 


NEW EDITORIAL STAFF FOR MTAC 


The Division of Mathematics of the National Academy of Sciences—National 
Research Council, Washington, D. C., announces that Harry Polachek, Techni- 
cal Director of the Applied Mathematics Laboratory of the David Taylor Model 
Basin, has been appointed Chairman of the Editorial Committee for the quar- 
terly journal Mathematical Tables and Other Aids to Computation effective Janu- 
ary 1959. He succeeds C. B. Tompkins of the University of California at Los 
Angeles, who held the post since November 1954. The other members of the 
Editorial Committee are: C. C. Craig, A. Fletcher, E. Isaacson, D. Shanks, C. 
V. L. Smith, A. H. Taub, C. B. Tompkins and J. W. Wrench, Jr. 

Mathematical Tables and Other Aids to Computation was founded in 1943 by 
R. C. Archibald with the aid of a grant from the Rockefeller Foundation and is 
published by the Division of Mathematics of the National Academy of Sciences— 
National Research Council. It features original papers in numerical analysis, 
high speed computer methods and other aids to computation as well as short 
articles reporting upon the latest developments in these fields. It serves as an 
information center on tables and other aids to computation appearing in the 
current literature not only in the fields of mathematics, physics, statistics, as- 
tronomy and navigation, but also in such areas as actuarial science, aeronautics, 
chemistry, engineering, geodesy, medicine and meteorology. A file of unpub- 
lished mathematical tables is maintained for use by subscribers. 

Articles for publication in Mathematical Tables and Other Aids to Computation 
should be addressed to Harry Polachek, Editor, Mathematical Tables and Other 
Aids to Computation, David Taylor Model Basin, Washington 7, D. C. 

Information on subscriptions may be obtained from National Academy of 
Sciences, Printing and Publishing Office, 2101 Constitution Avenue, Washing- 
ton 25, D.C. 


I 


INTERNATIONAL CONGRESS OF MATHEMATICIANS 1962 


1. The Secretary of the International Mathematical Union has announced 
that: The Committee authorized by the final plenary session of the Congress in 
Edinburgh to determine the location of the International Congress 1962 has 
accepted an invitation from the Swedish National Committee for Mathematics 
and the Swedish Mathematical Society in the following terms: 





NEWS AND NOTICES 629 


“To Mathematicians of all Countries. 

The Swedish National Committee for Mathematics and the Swedish Mathe- 
matical Society have the honour of inviting you to the next International Con- 
gress of Mathematicians, to be held in Stockholm during the Summer of 1962. 

We will do our best to make the Congress scientifically successful and enjoy- 
able, hoping that it will stimulate the interaction between mathematicians in 
different fields and countries (Ake Pleijel, Chairman of the Swedish National 
Committee for Mathematics, and Goran Borg, Chairman of the Swedish Mathe- 
matical Society).”’ 

2. The Republic of China (Taiwan) has been admitted to the International 
Mathematical Union through the Chinese Mathematical Society (Taipei, Tai- 
wan) as the adhering organization. 


—_—_—S 


STATISTICAL RESEARCH MONOGRAPHS 
(Revised announcement) 


The Institute of Mathematical Statistics and the University of Chicago have 
established a series of publications entitled Statistical Research Monographs. 

The primary purpose of this series is to provide a medium of publication for 
material of interest to statisticians that is not ordinarily provided for by exist- 
ing media. It will help fill the gap between journal articles and textbooks or 
treatises. Among the kinds of publications envisaged are: 

New research results too lengthy for the usual journal article. In particular, 
authors will have ample scope for detailed exposition of their findings. 

Research results of interest in both theoretical and applied statistics. At pres- 
ent authors of such material frequently find it necessary to publish part of their 
results in a theoretical journal and part in an applied journal. 

Expository monographs in particular areas of statistics. 

Discussions of statistical problems and techniques in particular areas of appli- 
cation. 

Every attempt will be made to maintain the highest standards of scholarship. 

Members of the Institute of Mathematical Statistics will receive a one-third 
discount on prepublication orders for monographs of the series when such orders 
are placed through the Treasurer of the I.M.S. A smaller discount will apply to 
post-publication orders. 

The Editorial Board consists of David Blackwell (University of California), 
William G. Cochran (Harvard University), Henry E. Daniels (University of 
Birmingham), Leo A. Goodman (University of Chicago), Wassily Hoeffding 
(University of North Carolina), Jack C. Kiefer (Cornell University), and William 
H. Kruskal (University of Chicago). 

Authors are invited to send manuscripts and correspondence concerning the 
series to Leo A. Goodman, Department of Statistics, University of Chicago, 
Chicago 37, Illinois. 





630 NEWS AND NOTICES 


ANNOUNCEMENT OF PRIZES AND RESEARCH ASSISTANCE 


The Indian Society of Agricultural Statistics has instituted prizes and research 
assistance for the promotion of research in statistics (available to persons of all 
nationalities residing in India). . 

Prizes of Rs. 500/ — will be awarded in the fields of design of experiments, 
sampling, statistical genetics, and statistical theory and methodology, for articles 
of high merit published in the Journal of the Society. The first awards will be 
made on the basis of articles published in volumes 8 and 9. 

Research assistance will be offered to five individuals up to a maximum of 
Rs. 500 to encourage promising workers who are not in a position to pursue re- 
search due to lack of suitable facilties or contacts. Research work done by recip- 
ients of assistance must be submitted for publication in the Society’s Journal. 

For further information concerning the prizes and research assistance, contact 
the Secretary, Indian Society of Agricultural Statistics, % Indian Council of 
Agricultural Research Statistical Wing, Library Avenue, New Delhi-12, India. 


I 


ANNOUNCEMENT OF AFFILIATE PLAN 


The Institute of Radio Engineers has established an affiliate plan, under which 
members of selected technical societies are entitled to become affiliated with, and 
receive the publications of, some of the Professional Groups of the I.R.E. with- 
out having to join the I.R.E. itself. They need only pay the regular Professional 
Group dues, plus $4.50, rather than the larger fee $10.00) for full Institute mem- 
bership. 

The Institute of Mathematical Statistics has been approved for affiliation 
with the Professional Group on Information Theory (PGIT). The regular PGIT 
dues are $3.00. Institute of Mathematical Statistics members who are working 
in an area closely related to information theory may wish to take advantage of 
the affiliate plan. 

The address of the Institute of Radio Engineers is 1 East 79 Street, New 
York 21, New York. Inquiries should be sent to Mr. E. R. Kretzmer at that 
address. 


ANNOUNCEMENT OF NEW JOURNAL 


Publication has recently begun of a new journal, Cahiers du Centre d’ Etudes 
de Recherche Opérationnelle. The publishing institution is the Centre d’Etudes 
de Recherche Opérationnelle, 267 Avenue Moliére, Bruxelles, Belgium. Inquiries 
should be addressed to M. P. P. Gillis, Président du Centre. The subscription 
fee is 200 francs per year (four issues). 





NEWS AND NOTICES 


FOURTEENTH ANNUAL MEETING OF THE ASSOCIATION 
FOR COMPUTING MACHINERY 


The Fourteenth Annual Meeting of the Association for Computing Machinery 
will be held at the Massachusetts Institute of Technology, Cambridge, Massa- 
chusetts, on September 1-3, 1959. Local arrangements will be under the direc- 
tion of Prof. F. M. Verzuh, Massachusetts Institute of Technology. 


—_ 


ROYAL STATISTICAL SOCIETY CONFERENCE 


The Industrial Applications Section and the Research Section of the Royal 
Statistical Society, will hold a joint Conference at Loughborough College of 
Technology during the weekend September 11-13, 1959. For further information 
contact C. J. Anson, Department of Mathematics, City of Birmingham College 
of Technology, Gosta Green, Birmingham 4, England. 


RR 


DOCTORAL DISSERTATIONS IN STATISTICS, 1958 


Listed below are doctorates conferred during the year 1958 in the United 
States and Canada for which the dissertations were written on topics in sta- 
tistics or related fields. The university, major subject (when available), and the 
title of the dissertation are given in each case. Readers are invited to notify the 
Editor of any omissions from this list. 


Mohamed S. Ahmed, University of California, Berkeley, major in statistics, “On a 
locally most powerful similar test for the independence of two Poisson variables.” 

Harvey J. Arnold, Princeton University, major in mathematical statistics, ‘‘Permutation 
support for multivariate techniques.’ 

W. O. Ash, Virginia Polytechnic Institute, ‘“Randomized estimates in power spectral 
analysis.”’ 

B. J. Attebery, University of Missouri, ‘‘“Some asymptotic properties of a maximum like- 
lihood estimator.”’ 

Ishverlal S. Bangdiwala, North Carolina State College, major in statistics, ‘‘Some 
sequential procedures for ordering populations according to means, variances and regression 
coefficients.” 

Rolf E. Bargmann, University of North Carolina, major in statistics, ‘‘A study of inde- 
pendence and dependence in multivariate normal analysis.”’ 

Joseph Blum, George Washington University, major in mathematics, ‘‘Banach space 
functions and matrix summability methods.’’ 

Leroy S. Brenna, Virginia Polytechnic Institute, major in statistics, ‘Factorial treat- 
ments in lattice designs.”’ 

Marion Ritchie Bryson, Iowa State College, major in statistics, ‘‘Analysis of farm and 
home development Benchmark survey results and associated statistical problems.’’ 

Jacob Chassan, George Washington University, ‘“‘Probability and the state of mind.” 

Yuan Shih Chow, University of Illinois, major in mathematics, ‘“The theory of martin- 
gales in an S-finite measure space indexed by directed sets.”’ 

Leonard Cohen, Columbia University, ‘‘On mixed single sample experiments.”’ 

Morris H. DeGroot, University of Chicago, major in statistics, ‘‘Unbiased sequential 
estimation of a probability.” 





632 NEWS AND NOTICES 


E. J. Delate, University of Buffalo, ‘The power of a statistical test based on the mean 
successive difference.” 

Earl Louis Diamond, University of North Carolina, major in statistics, ‘‘Asymptotic 
power and independence of certain classes of tests on categorical data.”’ 

Normal Richard Draper, University of North Carolina, major in statistics, ‘‘Investiga- 
tion of response surface designs. ’ 

John Leroy Folks, Iowa State College, major in statistics, ‘Comparison of designs for 
exploration of response relationships.”’ 

A. E. Garrett, Virginia Polytechnic Institute, ‘‘Estimation problems connected with 
stochastic processes.’’ 

J. J. Gart, Virginia Polytechnic Institute, ‘““‘SSome problems in statistical inference.’’ 

Edgar John Gilbert, University of California, Berkeley, major in statistics, ‘“The identi- 
fiability problem for functions of a finite Markov chain.” 

Donald Guthrie, Jr., Stanford University, major in statistics, ‘“Bayes acceptance sam- 
pling procedures for large lots.” 

Bernard Harris, Stanford University, major in statistics, ‘(Determining bounds on inte- 
grals with applications to cztaloging problems.”’ 

Wei-Chung Ho (Mrs.), Harvard University, major in statistics, ‘“Higher order factors 
in factor analysis.”’ 

Jean-Pierre Imhof, University of California, Berkeley, major in statistics, ‘“‘Contribu- 
tions to the theory of mixed models for the analysis of variance.” 

A. Joffe, Cornell University, major in mathematics, ‘‘Sojourn time for stable processes.” 

W. M. Kahan, University of Toronto, ‘‘Causs-Seidel methods of solving large systems 
of linear equations.”’ 

Harry Kesten, Cornell University, ‘‘“Symmetric random walks on groups.”’ 

Ken-ichi Kojima, North Carolina State College, major in statistics, ‘‘An analysis of 
genetic systems in which the phenotype depends upon deviations from an optimum.” 

John Clement Koop, North Carolina State College, major in statistics, “Contributions 
to the general theory of sampling finite populations without replacement and with unequal 
probabilities.” 

Lambert Herman Koopmans, University of California, Berkeley, major in statistics, 
“‘Asymptotic rate of discrimination for Markov processes.”’ 

K. S. Kretschmer, Carnegie Institute of Technology, ‘‘Linear programming in locally 
convex spaces and its use in analysis.’’ 

Roy Raymond Kuebler, Jr., University of North Carolina, Statistics, “The construction 
of error-detecting and error-correcting codes.”’ 

Lonnie Louis Lasman, North Carolina State College, major in statistics, ‘“Asymptotic 
distribution of sample size for certain methods of sequential sampling.” 

Leo Lynch, Virginia Polytechnic Institute, ‘“The analysis of paired ranked observations.”’ 

Albert Madansky, University of Chicago, major in statistics, ‘‘Identification and es- 
timation in latent class analysis.’’ 

Albert W. Marshall, University of Washington, major in mathematics, ‘‘On the growth 
of stochastic processes.”’ 

Kochettu Kuruvilla Mathen, Johns Hopkins University, major in biostatistics, ‘‘At- 
tribute matching in comparison studies in public health.’ 

Harlley E. McKean, Purdue University, major in mathematical statistics, “‘Utilization 
of chromosomes in quantitative inheritance.” 


Rupert Griel Miller, Jr., Stanford University, major in statistics, ‘‘A contribution to the 
theory of bulk queues.’’ 


Cristina Parel, University of Michigan, ‘‘A matrix derivation of generalized least squares 
linear regression with all variables subject to error.”’ 

R. N. Pendergrass, Virginia Polytechnic Institute, ‘“The rank analysis of triple compari- 
sons. 


” 





NEWS AND NOTICES 633 


R. R. Read, University of California, Berkeley, ‘‘Contributions to the statistical theory 
of cloud-chamber data.”’ 

R. H. Riffenburgh, Virginia Polytechnic Institute, ‘Linear discriminant analysis.” 

Gerald S. Rogers, State University of Iowa, major in mathematical statistics, ‘“On sta- 
tistics whose distributions depend upon a parameter.” 

Basilio Alfonso Rojas, Iowa State College, major in statistics, ““The analyses of groups 
of similar experiments.” 

Frank Salvatore Scalora, University of Illinois, major in mathematics, “Abstract 
martingale convergence theorems.”’ 

W. E. Smith, University of California, Los Angeles, ‘‘On posteriori probabilities.” 

Thomas H. Starks, Virginia Polytechnic Institute, major in statistics, ‘‘Significance 
tests in experiments involving paired comparisons.” 

K. R. Stronberg, University of Washington, ‘‘Probabilities on a compact group.’’ 

Howard Lewis Taylor, Iowa State College, major in statistics, ‘‘Statistical sampling for 
soil mapping surveys.”’ 

Ronald E. Walpole, Virginia Polytechnic Institute, major in statistics, ‘‘Combined intra- 
and inter-block analysis for factorials in incomplete block designs.” 

Alvin Dennie Wiggins, University of California, Berkeley, major in statistics, ‘‘A sta- 
tistical study of the mechanism of bacterial toxicity.” 

William Howard Williams, Iowa State College, major in statistics, ‘‘Unbiased regression 
estimators and their efficiencies.” 

George Zyskind, Iowa State College, major in statistics, “Error structures in experi- 
mental designs.”’ 


The following doctorates were conferred in 1957 and omitted from the prior list: 
R. G. Heyneman, University of California, Berkeley, ‘‘On uniqueness in abstract ergodic 
theory.” 


M. S. Rohatgi, University of California, Berkeley, ‘I. Locally unbiased tests of com- 
posite hypotheses with s constraints. II. Estimation of the location parameter based on 
certain order statistics. III. Distance estimates based on luminosity functions.” 


I 


REPORT OF THE PITTSBURGH, PENNSYLVANIA MEETING OF THE 
INSTITUTE OF MATHEMATICAL STATISTICS 


The 1959 Eastern Regional Meeting, eightieth meeting of the Institute of 
Mathematical Statistics, was held at the University of Pittsburgh, Pittsburgh, 
Pennsylvania, on March 19-21, 1959, in conjunction with the meetings of the 
Biometric Society (Eastern North American Region) and the American Sta- 
tistical Association (Section on Physical and Engineering Sciences). 

There were 154 members of the Institute registered for the meeting. The pro- 
gram of the meeting was as follows: 


THURSDAY, MARCH 19, 1959 


9:45-10:00 a.m. Welcome to the Societies, Chairman: Donovan THompPsoNn, 
University of Pittsburgh. 


Welcoming Remarks, Epmunp R. McCuvusxey, Vice Chancellor of the University of 
Pittsburgh. 





634 NEWS AND NOTICES 


10:00-12:00 a.m. Bioassays 


Chairman: Jerome CornFIELp, Johns Hopkins University. 
1. Polychotamous Quantal Responses in Biological Assay, Joun GURLAND AND ILBOK 
Lez, Iowa State College. 
2. The Statistical Evaluation of a Biological Assay for Antiemetic Drugs, J. L. CriminERA, 
C. A. Stone anp J. Ipsen, Merck, Sharpe and Dohme Co. and Massachusetts 
Dept. of Health. 
3. Some Techniques for Increasing Assay Precision, GorpoNn Tuomas, Schering Corp. 
Discussant: J. O. Irwin, University of North Carolina. 


10:00-12:30 p.m. Contributed Papers (Extra Session-IMS) 


Chairman: P. B. Buttock, U. 8. Steel Corporation. 

1. Mathematical Problems Associated with Measurements Made by Matching with Known 
Standards, W. 8. Connor anv N. C. Severo, National Bureau of Standards. 

2. The First and Second Moment Structure of the Maximum Likelihood Estimators of the 
Parameters of a Multi-variate Normal Distribution with Double Samples, Jack 
Naber, Bell Telephone Laboratories. 

3. Some Properties of Stirling’s Numbers of the Second Kind, Joun L. Baae, Florida 
State University. 

4. Minimaz Solutions to Trichotomies, LonNig L. LasMan, Florida State University. 

5. The Distribution of the Number of Successes in a Sequence of Dependent Trials, K. R. 
GaBRIEL, University of North Carolina. 

6. Values of Games with Moves in (0, 1), Martin Fox, University of California. 

7. Explicit Results for the Dam with Poisson Input, Joseru M. Gant, Columbia Univer- 
sity, N. U. Prasuvu, Karnatak University. 

8. Some Stochastic Processes with Application to Counter Models, RonALD Pyxke, Colum- 
bia University. 

9. Use of Series Expansion in Estimation Problems for Distributions Involving more than 
One Parameter (Preliminary Report) Y. 8. Sarne, University of Alberta (By title). 


1:30-3:30 p.m. Session of Mixed Topics 


Chairman: Rocer PinxuaM, Princeton University. 
1. Little Pieces of Mized Factorials, Joann W. Tuxey, Princeton University and Bell 
Telephone Laboratories. 
2. Zeros and Ties in the Wilcoxon Signed Rank Test, Joan Pratt, Harvard University. 
3. The Significant Separation of Two Small Samples on Many Variables, A. P. DEMPsTER, 
Harvard University. 
4. Rejection of Outliers, F. J. ANscomBE, Princeton University. 


2:00-3:30 p.m. Analysis of Experimental Data, I 


Chairman: D. B. Duncan, University of North Carolina. 
1. A Note on the Analysis of Transformed Data, R. E. BarGMANN, Virginia Polytechnic 
Institute. 
2. A New Application of the Logistic Model to Analyzing a Factorial Design When the 
Data Are Proportions, J. E. Grizzun, University of North Carolina. 
3. Analysis of Experiments Measuring Threshold Tests, E. K. Harris, Public Health 
Service. 


3:45-5:45 p.m. Contributed Papers I (IMS) 


Chairman: J. D. Hromr, U. 8. Steel Corporation. 
1. A Method of Multiple Presentation Scaling by Successive Intervals, Mary B. Mc- 
Gauay, Thiokol Corporation and Virginia Polytechnic Institute. 





NEWS AND NOTICES 635 


. Comparison of the Effectiveness of Tournaments, W. A. GLENN, Virginia Polytechnic 
Institute. 

. The Comparison of the Sensitivities of Similar Experiments: Model II of the Analysis 
of Variance, D. E. W. ScouMann, University of Stellenbosch, anp R. A. BrapLey, 
Virginia Polytechnic Institute. 

. Application of the Geometry of Quadrics in Finite Projective Space to the Construction 
of PBIB Designs, R. C. Bosr anp D. K. Ray-Cuaupuurs, University of North 
Carolina. 

. On the Construction of Sets of Pairwise Orthogonal Latin Squares, 8. 8. SurtKHANDE 
AND R. C. Bose, University of North Carolina. 

. A Necessary Condition for Existence of a Regular and Symmetrical P.B.I.B. Design of 
Triangular Type, J. Ocawa, University of North Carolina. 

. Use of Partially Balanced Block Designs with Three Associate Classes for Confounded, 
Asymmetrical Factorial Arrangements (Preliminary Report), Bapria M. Kurks1an, 
Diamond Ordinance Fuze Laboratories (By title). 

. Some New Cases of the Packing Problem in Finite Projective Space with Applications 


to Fractionally Replicated Designs, R. C. Bost, University of North Carolina (By 
title). 


3:45-5:45 p.m. Sampling and Survey Techniques—Theory and Practices 


Chairman: Vireit ANDERSON, Purdue University. 
1. The Present State of the Art, Donovan J. THompson, University of Pittsburgh. 


. Current Research Problems, ALAN Ross, University of Kentucky. 


9 
3. Present Status of Application of the Theory, Harotp NissEvson, Bureau of the 
Census. 


4. Question and Discussion Period. 


8:00 p.m. 


Chairman: C. I. Buss, Connecticut Agricultural Experiment Station. 
Biometrics Methods: Past, Present and Future, J.O. Inw1n, University of North Carolina. 


FRIDAY, MARCH 20, 1959 


9:00-10:30 a.m. Tests of Significance 


Chairman: W. T. Feperer, Cornell University. 


1. A Family of Truncated Sequential Tests, T. G. Donne iy, University of North 
Carolina. 


2. Significance Testing in Paired Comparisons, T. H. Starks, Virginia Polytechnic 


Institute and DuPont, anp H. A. Davin, Virginia Polytechnic Institute. 
3. A Test of the Exponential Distribution, Cuartes H. Ciunies-Ross, Virginia Poly- 
technic Institute. 


10:30-12:30 p.m. Theoretical Developments in Mathematical Statistics 


Chairman: Ropert Hooke, Westinghouse Research Laboratories. 
1. Generalized Maximum Likelihood; Non-asymptotic Justification and Applications, 
ALLAN BrrnBaum, Columbia University. 


2. Some New Results on the Bivariate Normal Integral, HaroLtp Rusin, Columbia Uni- 
versity. 


3. Dispersion on the Sphere, Georrrey 8. Watson, Princeton University. 





636 NEWS AND NOTICES 


11:00-12:30 p.m. Uses of Statistical Methods in Epidemiology 


Chairman: F. M. Hempuiuy, University of Michigan. 
1. Statistical Methodology in Retrospective Studies, NaTHAN MANTEL, National Institutes 
of Health (Read by Wrii1am HarEnsTE., National Institutes of Health). 


2. Epidemiological Methods and Inferences, ABRAHAM M. LILIENFELD, Johns Hopkins 
University. 
3. On Follow-up for Survival in the Presence of Movement, D. J. THompson anv D. 


Kopurn, University of Pittsburgh (Presented by D. Kop.tn). 


2:00-3:30 p.m. Uses of Life Table Analysis in Health Research 


Chairman: Donovan J. Toompson, University of Pittsburgh. 
1. The Analysis of Long Term Survival Experience in Humans, Stoney J. CuLtTser, 
National Institutes of Health. 
2. A Rapid Method for Estimating the Standard Error of the Survival Rate, Frep EpERER. 
Discussant: ABRAHAM M, LILIENFELD, Johns Hopkins University. 


2:00-3:30 p.m. Contributed Papers II (IMS) 


Chairman: WitLArp CLatwortuy, Westinghouse Electric Corp. 

1. Best Linear Estimates by Order Statistics of the Parameters of a Model for Failure Data, 
ANDRE G. Laurent AND ExLpon Rieu, Wayne State University. 

. On Testability in Normal ANOVA and MANOVA with All ‘Fixed Effects’, S. N. Roy 
AND J. Roy, University of North Carolina and Indian Statistical Institute, Cal- 
cutta. 

3. Contributions to Univariate and Multivariate Analysis of Variance with ‘Fixed Effects’, 
Normal Error and ‘Random Effects’ not Necessarily Normal, 8. N. Roy anp Wuit- 
FIELD Coss, University of North Carolina. 

4. Some Nonparametric Analogues of ‘Normal’ ANOVA and MANOVA and of Studies 
in ‘Normal’ Association, 8. N. Roy anp V. P. Buapxar, University of North 
Carolina. 

5. On Moments of Order Statistics from Normal Populations, Z. GovinpARAJULU, Uni- 
versity of Minnesota (By title). 

6. A Note on J. Roy’s Step-Down Procedure in Multivariate Analysis, V. P. BHAPKAR, 
University of North Carolina (By title). 

7. On a Class of Problems in Multivariate Analysis of Variance, 8. N. Roy anv J. Roy, 
University of North Carolina and Indian Statistical Institute, Calcutta (By title). 

8. A Note on Confidence Bounds Connected with ANOVA and MANOVA for Balanced 
and Partially Balanced Incomplete Block Designs, V. P. Bhapkar, University of 
North Carolina (By title). 


to 


3:30-5:00 p.m. Contributed Papers III (IMS) 


Chairman: Rospert L. Bricker, Westinghouse Electric Corp. 
1. Sufficient Partitions for a Class of Coin Tossing Problems (Preliminary Report), T. V. 
Narayana, University of Alberta. 
. Estimation of the Mean and Variance of Quantitative Characteristics in a Polygenic 
System, ALLAN G. ANDERSON, Western Kentucky State College. 
3. Selecting a Subset Containing the Best of Several Binomial Populations, 8. 5. Gupta 
AND M. Soset, Bell Telephone Laboratories, Inc., Allentown, Pennsylvania. 
4. Hadamard Matrices and a Problem in the Theory of Code Construction, R. C. BosE anp 
8. S. Sarrxw#anpe, University of North Carolina. 
5. Modified Neyman-Pearson Methods which Avoid ‘Paradozres’ and Tend to Coincide with 
Other Methods, ALLAN Birnspaum, Columbia University. 


2 





NEWS AND NOTICES 637 


6. Hypothesis Tests on the Population Lower Limit, Puiturp G. Caruson, Arthur Ander- 
sen and Co. (Introduced by 8S. B. Lirraver) (By title). 


7. Truncation and Tests of Hypotheses II, Inw1n Guttman, McGill University and 
Princeton University (By title). 


8:30 p.m. Random Balance Experiments 


Chairman: Wruu1am G. Cocuran, Harvard University. 
1. Random Balance Experimentation, F. E. Sarrerrawaire, Merrimack College. 


2. The Application of Random Balance Designs, T. A. Bupnz, Consultant, Great Neck, 
Long Island. 


Discussants: Oscar KempTuorne, Iowa State College; W. J. Youpen, National Bureau of 
Standards; J. W. Tuxey, Princeton University and Bell Telephone Laboratories, Murray 
Hill, New Jersey; F. J. ANscomBe, Princeton University. 


SATURDAY, MARCH 21, 1959 


8:30-9:30 a.m. Analysis of Experimental Data II 


Chairman: W. H. Horton, Westinghouse Electric Corp. 


1. The Role of Treatment Error in Comparative Experiments, Gzorae ZyskiNnp, Uni- 
versity of North Carolina. 


2. Factorials and Lattices, L. 8. BRENNA, Texas Company and Virginia Polytechnic 
Institute, anp C. Y. Kramer, Virginia Polytechnic Institute. 


9:45-10:45 a.m. Multiple Comparisons and Confidence Intervals 


Chairman: Joun GuRLAND, Iowa State College. 
1. A Simple Minimum Average Risk Procedure for the Multiple Comparisons Problem, 
D. B. Duncan, University of North Carolina. 
2. Asymptotic Simultaneous Confidence Intervals for the Probabilities of a Multinomial 
Distribution, C. P. QuEsENBERRY, Virginia Polytechnic Institute, anp D. C. 
Horst, Virginia Polytechnic Institute. 


11:00-12:30 p.m. Special Topics 


Chairman: Boyp HarsHBARGER, Virginia Polytechnic Institute. 
1. Priorities in Waiting Lines, Donatp E. Gaver, Westinghouse Electric Corp. 
2. Life Distribution of Systems with Spare Components, D. E. Morrison, National 
Institute of Mental Health anp H. A. Davin, Virginia Polytechnic Institute. 
3. Geometrical Methods in the Construction of Group Alphabets, R. C. Boszg, University 
of North Carolina anp R. R. Kuesuer, Jr., University of North Carolina. 


Dorotuy M. GILrorp 
Associate Secretary 


RR 


REPORT OF THE CLEVELAND, OHIO MEETING OF THE INSTITUTE 
OF MATHEMATICAL STATISTICS 


The eighty-first meeting of The Institute of Mathematical Statistics was held 
at the Case Institute of Technology, Cleveland, Ohio on April 2-4, 1959, jointly 
with the meetings of the Association for Computing Machines. One hundred and 
thirty-nine people registered for the meetings. Registration, an informal recep- 





638 NEWS AND NOTICES 


tion and a banquet were held in Tomlinson Hall and all technical sessions con- 
vened in the Strosacker Auditorium, both on the Case campus. The appointed 
officers for the meeting were Program Coordinator, M. B. Wirx, Bell Telephone 
Laboratories; Associate Secretary, J. Si.per, Roosevelt University; Assistant 
Secretary, F. C. Leone, Case Institute of Technology; Program Chairman, F. 
A. GraYBILL, Oklahoma State University. 

The program was as follows: 


THURSDAY, APRIL 2, 1959 
9:45-10:00 a.m. Welcome from Case Institute of Technology 


10:00-11:45 a.m. Invited Papers on Codes and Programs for High Speed Com- 
puters, Joint with A.C.M. 


Chairman: Bayarp Ran«rn, Case Institute of Technology. 
1. Statistical Programs for the IBM-650, Joun HAMBLEN, University of Kentucky. 
2. Applying Compilers to Statistics, Gzorce Haynam, Case Institute of Technology. 
3. Computers, Algebra, and Design of Experiment, Harry M. Huaues, School of Avia- 
tion Medicine. 


1:30-2:45 p.m. Invited Papers on Estimation and Testing Hypotheses 


Chairman: Irvine W. Burr, Purdue University. 
1. Maximum Likelihood Estimation of the Parameters of the Beta Function, H. O. Harrt- 
LEY, Iowa State College. Presented by J. A. GrEENwoop, Iowa State College. 
2. Testing Approximate Hypotheses, Jupan Rosensiatr, Purdue University. 


3:00-5:00 p.m. Invited Papers on Computing Procedure for Statistical Problems, 
Joint meeting with A.C.M. 


Chairman: J. TesTeRMAN, Jersey Production Research. 

1. Activities of the Committee on Mathematical Tables of the Institute of Mathematical 
Statistics, with Some Suggestions on Statistical Tables that need Computing, D. B. 
Owen, Sandia Corporation. 

2. An Iterative Least Squares Procedure for Fitting the Sum of a Gaussian and Exponential 
to Absorption Spectra, Donatp A. GARDINER AND Susie E. Atta, Oak Ridge Na- 
tional Laboratory. 

3. Computation and Use of Tables of Range and Studentized Range, H. Leon Harter, 
Wright-Patterson Air Force Base. 


FRIDAY, APRIL 3, 1959 
9:00-10:20 a.m. Invited Papers on Distribution-free Models 


Chairman: Pau Riper, Wright-Patterson Air Force Base. 
1. Error Structures for the Analysis of Variance, GEorGE ZysKIND , Iowa State College 
and University of North Carolina. 
2. A Method for Determining Distribution-free Tolerance Regions, with Application to 
the Weibull Distribution, James H. Starpteton, Michigan State University. 


10:30-12:00 a.m. Contributed Papers 


Chairman: J. Srrper, Roosevelt University. 
1. Relation Between Certain Incomplete Block Design, 8. 8S. SHRtKHANDE, University of 
North Carolina (By title). 





NEWS AND NOTICES 639 


. Quasi-ranges of Samples from an Exponential Distribution, P. R. Riper, Wright Air 
Development Center. 


. Asymptotic Rate of Discrimination for Markov Processes, L. H. Koopmans, Sandia 
Corporation. 


. Distribution of a Quadratic Form in Three Variables, A. G. LauRENT, Wayne State 
University. 
. On Exponentially-Mapped-Past Statistical Variables (Preliminary Report), J. Orrer- 
MAN, The University of Michigan, introduced by the Chairman. 
. Simultaneous Comparison of the Optimun and Sign Tests of a Normal Mean, R. R. 
Bauapbur, Indian Statistical Institute, Calcutta (By title). 
. Some Estimates of the Binomial Distribution Function, R. R. Banapur, Indian Statis- 
tical Institute (By title). 
. Distribution-free and Non Parametric Tolerance Regions: The Exponential Case (Pre- 
liminary Report), L. A. Goopman, The University of Chicago, anp A. MapANsKy, 
The Rand Corporation, (By title). 
. Bayesian Lot-by-Lot Sampling Inspection, H. B. E1senpere, Iowa State College 
(By title). 
10. Combining Inter-Block and Intra-Block Information in Balanced Incomplete Block 
Designs, F. A. GRAYBILL AND D. L. Wenxs, Oklahoma State University (By title). 
Minimal Sufficient Statistics in Incomplete Block Designs, Model II, D. L. Weexs 
AND F. A. GraysBiLtt, Oklahoma State University (By title). 
12. Some Theorems Concerning Eisenhart’s ModelII, F. A. GRaYBILL AND R. Hutrquist, 
Oklahoma State University (By title). 


11. 


2:00-3:20 p.m. Invited Papers on Markov Processes and Random Sequences 


Chairman: Paut Minton, Southern Methodist University. 


1. Some Problems on the Consolidation of States of a Markov Process, Murray RosEn- 
BLATT, Indiana University. 


2. The Serial and Gap Tests for Random Sequences, EvE BoFINGER AND V. J. BorInGcER, 
Institute of Statistics, North Carolina State College. 


SATURDAY, APRIL 4, 1959 


9:00-10:30 a.m. Invited Papers on Statistical Procedures which depend on 
High Speed Computers. Joint meeting with A.C.M. 
Chairman: Frep C. Leong, Case Institute of Technology. 


1. The Reciprocal Impact Between Statistics and Computers, Jack MosuMaNn, Corpora- 
tion for Economic and Industrial Research. 


2. Non-linear Estimation Procedures and Sampling Designs Procedures, MERVIN MULLER, 
Princeton University, and IBM. 


J. SrnBER 
Associate Secretary 


a 


PUBLICATIONS RECEIVED 


Arrow, K. I., Hurwicz, Leonip, anp Uzawa, H1rrorumt, Studies in Linear and Non-linear 
Programming; Stanford University Press, Stanford, California; December, 1958, $7.50, 
229 pp. 

Cogan, E. J., Kemeny, J. G., Norman, R. Z., Snevi, J. L., Toompson, G. L., Modern 
Mathematical Methods and Models, Volume II (A Book of Experimental Test Materials), 





640 NEWS AND NOTICES 


under the direction of the Committee on the Undergraduate Program, Mathematical 
Association of America, 1958, 313 pp. 

Fisz, Marek, Wahrscheinlichkeitsrechnung und Mathematische Statistik, Veb Deutscher Ver- 
lag der Wissenschaften, Berlin, 1958, 528 pp. (translation from the Second Polish Edi- 
tion). 

GumsBeL, Emit J., Statistics of Extremes; Columbia University Press, 2960 Broadway, New 
York 27; October, 1958, $15.00, 375 pp. 

Marcuant, R., ‘‘La Compensation des Mesures Surabondantes,’’ Institut Géographique 
Militaire, 2 Allée du Cloitre, Bruxelles, 1956, 149 pp. 

United Nations, Yearbook of International Trade Statistics 1957, Volume I, prepared by the 
Statistical Office of the United Nations, Department of Economic and Social Affairs, 
New York (distributed by Columbia University Press, 2960 Broadway, New York 27), 
1958, $6.00, 622 pp. 





BIOMETRIKA 


Volume 46, Parts 1 and 2 Contents June 1959 


Memoirs 


Smiru, W. L. On the cumulants of renewal processes. Mercer, A. & Smiru, C. 8. A random walk in which 
the poops occur randomly in time. BarrHoLtomew, D. J. A test of homogeneity for ordered alternatives. Rao, 
C. R. Some problems involving linear hypotheses in multivariate analysis. Law.zy, D. N. Tests of signifi- 
cance in canonical —. Puiurrs, A. W. The estimation of parameters in systems of stochastic differential 
equations. Box, G. E. P. & Lucas, H. L. Design of experiments in non-linear situations. Biacrrs, J. D. The 
estimation of missing and mixed-up observations in several experimental designs. Darwin, J. H. Note ona 
three-decision test for comparing two binomial populations. Srivastava, A. B. L. Effect of non-normality 
on the power of the analysis of variance test. Davin, F. N. The z-test and symmetrically distributed random 
variables. Buatz, D. H. Ap roximation to the distribution of sample size for uential tests. I. Tests of 
simple hypotheses. Davip, H. A. Tournaments and paired comparisons. Saw, J. G. imation of the normal 
population parameters given a singly censored sample. Puiar, K. C. 8. & Samson, P., Jn. On Bovetings 
generalization of T?. Barractouan, E. D. & Pac, Exmasern 8. Tables for Wald tests for the mean of a 
normal distribution. QUENOUILLE, M. H. Tables of random observations from standard distributions. Levine, 
J. Monomial-monomial symmetric function tables. 

Miscellanea: Contributions by D. E. Barron & F. N. Davin, K. G. Cremans, B. W. Cownotty, 
A. W. Kiwpaut & E, Leacn, D. C. Lesue, D. E. Luoyp, R. G. Mrrron & F. R. Morean, K. C. 8. Puar 
& Ceuia G. Banterout, T. A. Ramasussan, R. H. Somers. 

Corrigenda: P. H. Lesurz & J. C. Gower; A. C. Arrxen; P. G. Moons; W. A. O’N. Wavaa. 


Reviews Other Books received 


The subscription, payable in advance, is now 54s (or $8.00), per volume (including postage). Cheques should 
be made payable to Biometrika, crossed “a/c Biometrika Trust’ and sent to the Secretary, Biometrika Office, 
Department of Statistics, University College, London, W.C.1. All foreign cheques must be drawn on a bank 
having a London agency. 

Issued by THE BIOMETRIKA OFFICE, University College, London 





ECONOMETRICA 


Journal of the Econometric Society 
Contents of Vol. 27, No. 2 = April 1959 


Lamy, totes: Substitution versus Fixed Production Coefficients in the Theory of Economic Growth: 

ynthesis 

Raonar Friscu: A Complete Scheme for Computing All Direct and Cross Demand Elasticities in a Model 
with Many Sectors 

Epwin Kon: The Validity of Cross-Sectionall;; Estimated Behavior Equations in Time Series Applications 

Epmonp MALINVAUD: er d’Expansion et Taux d’Interét 

Micuas. J. Brennan: A Model of Seasonal Inventories 

Joun W. Hooper: Simultaneous Equations and Canonical Correlation Theory 

Kenstro Ara: Aggregation Problem in Input-Output Analysis 

Ricwarp N. Roser: A Statistical Model of Friction in Economics 

Report or THE CAMBRIDGE MEETING 

Boox Reviews ‘ 

Theoretical Welfare Economics (J. de V. Graaff). Review by William J. Baumol. Théories Contemporaines du 

Profit (Bernard Biet). Review by K. E. Boulding. Fluctuations, Growth and Forecasting: The Principles of 

Dynamic Business Economics (8S. J. Maisel). Review by William M. Capron. International Bibliography of 

Economics, Vols. II] and IV. Review by Otto H. Ehrlich. On Human Communication: A Review, a Survey, 

and a Criticism (Colin Cherry). Review by Eberhard Fels. Influence de la Nationalisation sur la gestion des 

Entreprises Publiques (Monique Maillet-Chaseagne). Review by Walter Froehlich. Decision Making: An 

Experimental Approach (Davidson, Suppes, and Siegel). Review by J. Kiefer. Introduction to Finite Mathe- 

matics (J. G. Kemeny, J. L. Snell, and G. L. Thompson). Review by Kenneth O. May. Das Rechnungswesen 

im Dienste der Leitung (Henrik Virkkunen). Review by Eric Schiff. Farm Size, Farming Intensity and the 

Input-Output Relationship of Some Welsh and West of Pages Dairy Farms (Michael B. Jawetz). Review by 

Orlin J. Scoville. Economic Models: An Exposition (E. F. Beach). Review by Daniel B. Suits. Operations 

Research for Management, Vol. Il (Joseph F. McCloskey and John M. Coppinger, Editors). Review % Lester 

G. Telser L’Echange Internationale (Michel Moret). Review by Erik horbecke. Economic and Technical 

Analysis of Fertilizer Innovations and Resource Use (E. L. Baum, E. O. Heady, J. T. Pesek. and C. G. 

Hildreth). Review by G. 8. Tolley. Some Aspects of the Acceleration Principle (Poul Winding). Review by 

J. K. Whitaker. Wahrscheinlichkeitatheorie (H. Richter). Review by J. Wolfowitz. 


ANNOUNCEMENTS AND NOTES 











ESTADISTICA 


Journal of the Inter American Statistical Institute 


Volume XVI, No. 61 Contents December 1958 
Algunos Aspectos de la Metodologia sobre Pronésticos de Poblacién para Subdi- 
visiones Geogrdficas de Paises (traduccién) Jacos 8. SrecEeL 
Un Andélisis Comparativo de la Migracién Rural-U rbana en Latinoamérica (traduc- 
ci6n....... ...T. Lynn Smita 
Estimacién de la Variancia de Muestreo Cuando Se Seleccionan Dos Unidades de 
Cada Estrato (traduccidén).. NATHAN KEYFITZ 
Encuesta por Muestreo de las Fincas en la Provincia de Buenos Aires, Argentina 
(traduccién)............. ....RayMonp J. JESSEN y Donavan J. THOMPSON 


Estudio de las Relaciones Interindustriales para 1947—parte I (traduccién) 
W. Duane Evans y Marvin HorrenBERG 
Un Esquema Simplificado para la Integracién del Andlisis Monetario y de Ingresos 
RosBert TRIFFIN 
Descripcién del Sistema de Tarjetas con Perforaciones Marginales (traduccién) 
Hans-JoacHim TRUMPF 
International Resolutions Relating to Statistics. Institute Affairs. Statistical News. 
Publications. 


Published quarterly Annual subscription price $3.00 (U. S.) 
INTER AMERICAN STATISTICAL INSTITUTE 


Pan American Union 
Washington 6, D.C. 


TRABAJOS DE ESTADISTICA 


Review published by ‘‘Instituto de Investigaciones Estadisticas’’ of the ‘‘Consejo 
Superior de Investagaciones Cientificas.’”” Madrid, Spain. 


Vol. IX CONTENTS Cuaderno III 
P. Zoroa ; ey Convolucién de histogramas. 
D. MARAVALL....... La adicién de vectores aleatorios is6tropos en un espacio de N dimensiones 
E. Cansavo spate’ + dasitates ...-. Sobre la inversién de matrices de Leontief. 


Se NE SS Se Sib ccd soindecdsees Problema de la selecci6n de la Cartera. 


Cronica. Bibliografia. Cuestiones y Ejercicios. 


For everything in connection with works, exchanges and subscription write to Professor Sixto Rios, Instituto 
de Investigaciones Estadisticas, Consejo Su or de Investigaciones Cientificas (Serrano, 123). Madrid 
Spain. The Review is composed of three fasci published three times a year (about 350 pages), and its annual 
price is 100 pesetas for Spain and South America and $4.00 U.S.A. for all other countries. 


JOURNAL OF THE AMERICAN 
STATISTICAL ASSOCIATION 


Volume 54 March, 1959 Number 285 


Maps Based on Probabilities... ......M, Caornowsx1 
Statistical Data Available for Economic Rasesch on Costale Types of Recreation.. Marion Clawson 


Optimum Univariate Stratification ... Tore DaLEntvs AND Josepn L. Hopers, Jr. 
Some Leading Soviet Statistical Books of 1957... Epernarp Fars 


Measures of Association for Cross-Classifications: Further Discussion and References 

Leo A. GoopmMan anv Wii1am H. Kavsxar 

Inhalation in Relation to Type and Amount of Smoking E. Curter Hammonp 

Statisticians—Today and Tomorrow Water E. Hoapier 
How Many of a Group of Random Numbers will be Usable in Selecting a Particular Sample? 

owarp L. Jonzs 

Testing the i that a Binomial Probability Equals 14 Wiuu1am J. MacKruyvnon 

The Fitti traight Lines when Both Variables are to Error Apert MapaNnskKY 


Some Problems of the Household Interview Design for the National Health Survey 


Tuaeopore D. Woo.isrry anv Haroip Nisseis0n 
A Production Model and  Copginnees Sampling Plan I. Ricwarp Savace 


A Single Sampling Plan f or _eneeies Variables with a Single-Sided Specification ‘Limit K. C. Sau 
Panel Mortality and Panel Bi Mazron G. Sonor 


Publications Decisions and Their Possible Effects on Inferences Drawn from Tests of Significance—or Vice 


Versa Treopors D. Senmawe 
Minimum Risk Specification Limits.......................0....ssee0e- Frep H. Tiverr anv J. A. Meram 
Linear Programming Techniques for Regression Anal Harvey M. Waener 
Simplified Beta- Approximations to the Kruskal-Walli acts ....Davip L. Watiace 
Comments on “The Simplest Signed-Rank Tests’. . cupibdetescsecedat Joun E. Wars 
Lower Bound Formulas for the Mean Intercorrelation Coefficient. Ricwarp H. Wiis 


AMERICAN STATISTICAL ASSOCIATION 
BEACON BUILDING 

1757 K STREET, N. W. 

WASHINGTON 6, D. C. 














} 


= 


